fixes #199 - Githubissues

siferati commented 3 years ago

Implementation of the suggestion by @SteveMacenski to fix #199:

The fix I could see would be to touch the cells cleared as they were cleared to resize the bounds

Cells that were cleared are stored in an unordered_set and touched when updating the ROS costmap.

Bugs fixed in this PR:

[x] Cleared cells are not touched, so the corresponding area of the costmap isn't updated. The cleared_cells container solves this.
[x] Cleared cells are still marked in the cosmap as obstacles. The cleared_point flag solves this.

(11/05) Edit: Updated to better describe the work that was done.

siferati commented 3 years ago

I think there's something wrong with the way I add the new marked points to _grid_points. I'm seeing a lot more points in the point cloud in RViz.

If both the robot and the obstacle are static, the point cloud should be almost the same every cycle.

However, with this PR the point cloud is a lot denser than before, meaning the points I add when marking have different coordinates than the ones that are added on clearing.

SteveMacenski commented 3 years ago

Not sure, it looks OK to me, you should only see the new points from the measurement sensor that weren't in the grid already.

Are they overlapping each other? The only thing I can think of is that since you input the global coordinates directly from the sensor, that these points weren't already in the _cost_map because they're not binned into the voxels. You could try binning them before calling PopulateCostmapAndPointcloud. A simple way to prototype would be to convert to index and back to global. The other points are coming out of the voxel grid so they'll have been binned by the voxelization transform.

siferati commented 3 years ago

because they're not binned into the voxels

Yup, that was the problem. I compared the size of _grid_points and _cost_map when the only visible obstacle was the wall on the right (see video in OP). There's a very big difference between their sizes:

binned _grid_points->size(): ~4000
binned _cost_map->size(): ~100
not binned _grid_points->size(): ~8000
not binned _cost_map->size(): ~4000

Since it seems like we can't input the global coordinates directly, I removed my overload of the SpatioTemporalVoxelGrid::PopulateCostmapAndPointcloud function and edited the original one to handle both clearing and marking.

I think the only problem left is the one I mentioned in the OP about the sides of the costmap not being properly cleaned.

SteveMacenski commented 3 years ago

I think the only problem left is the one I mentioned in the OP about the sides of the costmap not being properly cleaned.

I don't recall that, what's that issue

siferati commented 3 years ago

This mostly fixes the issue, however there is still an edge case I wasn't able to find a solution for. When the cleared obstacles are on the side, then the local costmap remains uncleared. You can see this happening in this video: although the obstacle in the front was correctly cleared, the obstacle in the left side remains.

SteveMacenski commented 3 years ago

Were those points outside of the bounds of the local costmap? If so, they shouldn't be marking because they shouldn't be touched to be able to expand the bounds outside of that window range. Is this new to this branch? If so, that means you're touching things that shouldn't be touched expanding the bounds.

siferati commented 3 years ago

Were those points outside of the bounds of the local costmap?

Nop.

Is this new to this branch?

The upstream branch didn't clear the obstacles at all. This PR at least clears them under certain conditions, although it seems like the main problem is still not solved.

After some more testing, I think the problem is that in the video I posted in the OP there were no walls on the left side. Take a look at this video, where the robot is in the middle of the map with no walls near it. The obstacles stay permanently in the costmap until a different obstacle is detected in the same direction behind it.

SteveMacenski commented 3 years ago

How does this PR not solve that issue? If your caching the clears, then it should be part of the next update bounds update to get queried and cleared out

siferati commented 3 years ago

How does this PR not solve that issue?

I don't know...! From what I can tell, there are only two places where the voxels are cleared, and I'm caching the cells in both cases:

https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/651fcd7762fa86f7defa6e20b18ac21933be37c3/src/spatio_temporal_voxel_grid.cpp#L206-L207 https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/651fcd7762fa86f7defa6e20b18ac21933be37c3/src/spatio_temporal_voxel_grid.cpp#L231-L232

The current state of affairs is:

upstream: obstacles are never removed from the costmap, issue #199
this PR: obstacles are removed from the costmap only if there is another obstacle behind it (even if that other obstacle is outside the costmap bounds)
In both cases, putting a static layer under the STVL layer fixes the problem

SteveMacenski commented 3 years ago

Make any progress on this?

siferati commented 3 years ago

Hi, unfortunately I wasn't able to find the cause of the issue. Since touching the cleared cells doesn't seem to fix it, then maybe the underlying problem is something different from your initial guess?

If you have any ideas I'd be happy to try them out! Otherwise I can also provide a minimum working example if you'd like.

SteveMacenski commented 3 years ago

@siferati is it as simple as https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/651fcd7762fa86f7defa6e20b18ac21933be37c3/src/spatio_temporal_voxel_layer.cpp#L690 this is not also setting the cell as cleared? https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/651fcd7762fa86f7defa6e20b18ac21933be37c3/src/spatio_temporal_voxel_layer.cpp#L682

Keep in mind there could be multiple cells in that column that could still be marked so it shouldn't be always clearing costmap_, but if there's no more items marked in that column it should be, e.g.

  std::unordered_set<volume_grid::occupany_cell>::iterator cell;
  for (cell = cleared_cells.begin(); cell != cleared_cells.end(); ++cell)
  {
    if (column contains < _mark_threshold) {
      costmap_[getIndex(map_x, map_y)] = costmap_2d::FREE_SPACE; // You still need to retrieve map_x and map_y or the index
      touch(cell->x, cell->y, min_x, min_y, max_x, max_y);
    }

  }
}

siferati commented 3 years ago

Keep in mind there could be multiple cells in that column that could still be marked so it shouldn't be always clearing costmap_, but if there's no more items marked in that column it should be

The problem remains even if I do that. But while testing your idea I found something very interesting!

a) If I mark cleared cells as free space before lethal obstacles are marked, the problem remains:

void SpatioTemporalVoxelLayer::UpdateROSCostmap( \
                        double* min_x, double* min_y, \
                        double* max_x, double* max_y, \
                        std::unordered_set<volume_grid::occupany_cell>& cleared_cells)
{
  // grabs map of occupied cells from grid and adds to costmap_
  Costmap2D::resetMaps();

  std::unordered_set<volume_grid::occupany_cell>::iterator cell;
  for (cell = cleared_cells.begin(); cell != cleared_cells.end(); ++cell)
  {
    uint map_x, map_y;
    if (worldToMap(cell->x, cell->y, map_x, map_y))
    {
      costmap_[getIndex(map_x, map_y)] = costmap_2d::FREE_SPACE;
      touch(cell->x, cell->y, min_x, min_y, max_x, max_y);
    }
  }

  std::unordered_map<volume_grid::occupany_cell, uint>::iterator it;
  for (it = _voxel_grid->GetFlattenedCostmap()->begin();
       it != _voxel_grid->GetFlattenedCostmap()->end(); ++it)
  {
    uint map_x, map_y;
    if ( it->second >= _mark_threshold && \
         worldToMap(it->first.x, it->first.y, map_x, map_y))
    {
      costmap_[getIndex(map_x, map_y)] = costmap_2d::LETHAL_OBSTACLE;
      touch(it->first.x, it->first.y, min_x, min_y, max_x, max_y);
    }
  }
}

b) However, if I mark cleared cells as free space after lethal obstacles are marked, the problem is fixed:

void SpatioTemporalVoxelLayer::UpdateROSCostmap( \
                        double* min_x, double* min_y, \
                        double* max_x, double* max_y, \
                        std::unordered_set<volume_grid::occupany_cell>& cleared_cells)
{
  // grabs map of occupied cells from grid and adds to costmap_
  Costmap2D::resetMaps();

  std::unordered_map<volume_grid::occupany_cell, uint>::iterator it;
  for (it = _voxel_grid->GetFlattenedCostmap()->begin();
       it != _voxel_grid->GetFlattenedCostmap()->end(); ++it)
  {
    uint map_x, map_y;
    if ( it->second >= _mark_threshold && \
         worldToMap(it->first.x, it->first.y, map_x, map_y))
    {
      costmap_[getIndex(map_x, map_y)] = costmap_2d::LETHAL_OBSTACLE;
      touch(it->first.x, it->first.y, min_x, min_y, max_x, max_y);
    }
  }

  std::unordered_set<volume_grid::occupany_cell>::iterator cell;
  for (cell = cleared_cells.begin(); cell != cleared_cells.end(); ++cell)
  {
    uint map_x, map_y;
    if (worldToMap(cell->x, cell->y, map_x, map_y))
    {
      costmap_[getIndex(map_x, map_y)] = costmap_2d::FREE_SPACE;
      touch(cell->x, cell->y, min_x, min_y, max_x, max_y);
    }
  }
}

Since b) works but both a) and your suggestion don't, it seems to me that the problem is that the voxel flattened costmap contains cells that should have been cleared but weren't. Or rather, since https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/e077795c89a280ffffa83db61af540464e34a892/src/spatio_temporal_voxel_grid.cpp#L114 is called every update, it seems more logical to say that there is obstacle information being added to the costmap about obstacles that no longer exist.

SteveMacenski commented 3 years ago

I'll need to think about that a little.

    if (worldToMap(cell->x, cell->y, map_x, map_y))
    {
      costmap_[getIndex(map_x, map_y)] = costmap_2d::FREE_SPACE;
      touch(cell->x, cell->y, min_x, min_y, max_x, max_y);
    }

This isn't quite right I don't think. You didn't check if the cleared voxel clears enough of the column of the voxels to make it as free now. If you had a column of 10 voxels and you cleared one, you shouldn't mark the costmap now as clear, its still occupied. You need to check for the number of cells still marked in the column and if this clearing operation now brings the number of voxels marked in this column of the costmap to 0 or below the mark threshold.

Though, I have to admit, I would need to sketch it out, but the logic of placing the clearing after marking might implicitly solve that for you since we only mark as LETHAL when the mark threshold is already met (and is cleared every iteration). In fact, since we're clearing the costmap every single iteration (resetMaps) and the GetFlattenedCostmap() is also repopulated every single iteration, I'm not wondering why any of this cleared_cells stuff is needed. If we clear everything to FREE and we only set costmap cells to LETHAL if GetFlattenedCostmap() points are higher than _mark_threshold, there should never be a situation where things are over marked.

Where is the state being tracked between iterations incorrectly needing any of this? I don't think that block of code I suggested above should actually be there, if that's working, that means something is marking points that shouldn't have been marked at all.

_cost_map is cleared at the start of clearFrustums thusly emptying the flattened costmap every iteration
_cost_map is updated in clearing frustums only when the point its iterating over remains in the VDB grid. If we ClearPoint, it never gets to the Populate... function to insert itself into _cost_map. We +1 count on the costmap cell.
_cost_map is then updated when a new point is marked with new data with a +1 count from an addition you made in this PR
Then in updateBounds, we iterate over _cost_map to fill in an empty costmap_ object with LETHAL only when the count in an individual cell is high enough

I don't see where we went wrong that there would be anything marked in the costmap_ that wasn't in the VDB grid.

The only slight hiccup I see is that when you added PopulateCostmapAndPointcloud() to the Marking functions, there's no check on if the 3D (x,y,z) voxelized point that is being viewed by a sensor in marking before inserting it into the _cost_map and incrementing the column count +1. What that could do is inflate the count, since you're +1-ing a point that was already +1-ed in the grid already from the clearing updates. Those counts are supposed to be for the number of viewed voxels, not the number of total column hits.

I'm trying to follow that train of logic and I don't see how that would result in over marking if the obstacle simply vanished (because there would be no marking at that cell to cause it), but maybe that's what's actually causing this?

siferati commented 3 years ago

You didn't check if the cleared voxel clears enough of the column of the voxels to make it as free now

The problem remains even if I do that both a) and your suggestion don't [work]

I did it according to your suggestion. I just didn't post the code snippet since you had already posted the pseudo-code above. And since marking every cleared cell as free space before lethal obstacles are marked doesn't fix the problem, it makes sense that your suggestion also wouldn't work. Marking every cleared cell as free space and then overwriting the ones that are above a threshold with lethal obstacles would result in the same costmap as first marking the cells above the threshold as lethal obstacles and then marking as free space the cells below that threshold.

wondering why any of this cleared_cells stuff is needed

Because we need to touch the cleared cells to update the bounds. Quoting your previous comment: the internal voxel grid is empty on clearing, then nothing new marks it because the obstacle instantly disappeared, and then when it comes time to update the bounds, there's nothing to update them with (and there's no static layer to set the reasonable update bounds).

The only slight hiccup I see is that when you added PopulateCostmapAndPointcloud() to the Marking functions, there's no check on if the 3D (x,y,z) voxelized point that is being viewed by a sensor in marking before inserting it into the _cost_map and incrementing the column count +1. What that could do is inflate the count, since you're +1-ing a point that was already +1-ed in the grid already from the clearing updates.

The only way I can think of to solve this is to keep track of the points that we marked in the clearing update in some container, and then when marking new points check for inclusion in that container: if it exists, then we skip the first +1. But this would likely slow down the process quite a bit, since for every marking observation we would have to check if it exists in that container. I'm not sure if it's worth doing this in order to avoid the column count sometimes being off by 1.

I finally found the problem!

If we ClearPoint, it never gets to the Populate... function to insert itself into _cost_map.

When the point is cleared here

https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/e077795c89a280ffffa83db61af540464e34a892/src/spatio_temporal_voxel_grid.cpp#L227-L232

we continue to the next point. But when we clear the point here

https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/e077795c89a280ffffa83db61af540464e34a892/src/spatio_temporal_voxel_grid.cpp#L203-L206

we do nothing. So despite clearing this point, we will still populate it in

https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/e077795c89a280ffffa83db61af540464e34a892/src/spatio_temporal_voxel_grid.cpp#L234-L235

I added a flag to avoid calling PopulateCostmapAndPointcloud with points we cleared and that solved the problem.

SteveMacenski commented 3 years ago

Rather than cleared_point, could you not just break; on L212 after we clear it so that we never enter that part of the loop like we do with Marking?

That also makes me wonder if we also break on L238 after the other clearing operation, then we never hit the PopulateCostmapAndPointcloud call, and thusly cleared points wouldn't entire that pointcloud. I think that could be the bug requiring cleared_cells (we might be able to remove this now?)

I think the fact that I didn't add break;s after the clears and only the marking operation in ClearFustums is the cause of the issue. If we add that breaks, I think everything else would fall back in line, no? It looks like L238 had a continue before your PR changes, but I think maybe that continue should actually be a break (but either way, in that situation it didn't hit populate; but the other clear did)

siferati commented 3 years ago

Rather than cleared_point, could you not just break; on L212 after we clear it so that we never enter that part of the loop like we do with Marking?

If you break there, then you're breaking the inner loop (frustums), not the outer loop (grid points). That's what's happening with marking, you break out of the inner loop -- otherwise you would never reach the PopulateCostmapAndPointcloud call. That's why the cleared_point flag is necessary.

https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/e077795c89a280ffffa83db61af540464e34a892/src/spatio_temporal_voxel_grid.cpp#L171-L185

That also makes me wonder if we also break on L238 after the other clearing operation, then we never hit the PopulateCostmapAndPointcloud call, and thusly cleared points wouldn't entire that pointcloud.

It looks like L238 had a continue before your PR changes, but I think maybe that continue should actually be a break

If you break on L238, then you're breaking out of the outer loop (grid points), meaning you will return from the function and not evaluate the remaining points. That's why there was a continue statement there previously -- to skip the call to PopulateCostmapAndPointcloud but keep iterating through the remaining points.

I simply replaced the continue statement with the cleared_point flag for clarity's sake, so that it would have the same syntax as the 1st clear above, and so that I would only have to write the cleared_cells.insert statement once -- thus avoiding duplicated code.

The only slight hiccup I see is that when you added PopulateCostmapAndPointcloud() to the Marking functions, there's no check on if the 3D (x,y,z) voxelized point that is being viewed by a sensor in marking before inserting it into the _cost_map and incrementing the column count +1. What that could do is inflate the count, since you're +1-ing a point that was already +1-ed in the grid already from the clearing updates. Those counts are supposed to be for the number of viewed voxels, not the number of total column hits.

The only way I can think of to solve this is to keep track of the points that we marked in the clearing update in some container, and then when marking new points check for inclusion in that container: if it exists, then we skip the first +1. But this would likely slow down the process quite a bit, since for every marking observation we would have to check if it exists in that container. I'm not sure if it's worth doing this in order to avoid the column count sometimes being off by 1.

I thought a bit more about this and realized that it is actually not just off by 1, but off by double. If we have some object that is made up of, for example, 10 voxels in a vertical line (so they all share the same (x, y) coords), then what's gonna happen is that in the clearing update we're gonna update the corresponding costmap cell 10 times, resulting in the cell having a value of 10, and then when marking we're gonna update it another 10 times -- resulting in that cell having a column value of 20!

Given the above, then I think it is better to either fix it or revert the marking logic so that it stays 1 update behind (i.e. the current upstream behaviour, before this PR).

What do you think?

SteveMacenski commented 3 years ago

If you break there, then you're breaking the inner loop (frustums), not the outer loop (grid points).

Yes, that would be the point, then you move onto the next point, which would then have the opportunity to be populated. But that would break out of the current point for all future frustums (since if you just cleared it, it doesn't matter what any other frustum does in those contexts).

I think, unless I'm missing something, that flag isn't necessary.

I agree with you on L238, that should be a continue not a break, good catch. The continue is what was there before that it looks like your PR removed.

I do really think though from the looking I did this morning is if we re-add the continue on L238 and add that break I suggested in the frustum clearing, then we no longer require clearing_cells at all. We needed that before because we weren't breaking out of the loop, so we were adding cells that were just cleared into the Populate... function. If we break, then the Populate... never gets cleared cells, and thusly there's no need to track the specifically cleared frustum cells anymore.

From the behavior you reported too, that logic tracks for me. The reason it didn't clear is because it was still in the frustum view, and when we cleared, we didnt break; so we ended up adding that cell to Populate... which then in turn was added to the costmap.

resulting in the cell having a value of 10, and then when marking we're gonna update it another 10 times -- resulting in that cell having a column value of 20!

Exactly, that's why I mentioned that since you added the Populate... function to marking, we need to check if that cell was already in the grid and only increment the count if its a new cell. That may be possible to do without adding too much overhead. Either way, even if we keep the updates to 1 cycle off, we should still fix some of these other concerns you've brought up with not having a static map underneath it. But I would tend to agree we should fix 1 issue at a time. Resolve the main issue first and then the 1 cycle off issue later.

siferati commented 3 years ago

But that would break out of the current point for all future frustums (since if you just cleared it, it doesn't matter what any other frustum does in those contexts).

Ok, I agree. Adding the break statement for this reason makes sense.

Yes, that would be the point, then you move onto the next point, which would then have the opportunity to be populated.

I think, unless I'm missing something, that flag isn't necessary.

By inserting the break statement there, you're exiting the inner frustum loop, but you're still inside the outer loop. So the next statement to be evaluated is this:

https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/e077795c89a280ffffa83db61af540464e34a892/src/spatio_temporal_voxel_grid.cpp#L222-L233

Since frustum_cycle was set to true above, then you do not enter that if statement. The next statement evaluated is

https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/e077795c89a280ffffa83db61af540464e34a892/src/spatio_temporal_voxel_grid.cpp#L234-L235

So you will still populate the costmap with the cleared point. In order to avoid this, you would need to break the inner loop and immediately continue the outer loop. This is not possible without using some sort of flag.

But I would tend to agree we should fix 1 issue at a time. Resolve the main issue first and then the 1 cycle off issue later.

Agreed.

SteveMacenski commented 3 years ago

Ah yes, we break out of the inner loop, it doesn't just reset to the outer, it still has that code left. OK. A bool point_cleared then could be added if we clear it there and then we can add a

if (cleared_point) {
  continue;
}

after the for loop but before the if(!frustum_cycle) {...} statement. I think now that might fix our issues, yes? :wink:

siferati commented 3 years ago

But that's exactly the same behavior of what I have already done... The only difference is that with the way you're suggesting I would have to write cleared_cells.insert twice. The way I did it, I only write it once and the code is easier to read.

Edit: only thing missing is to add the break statement we discussed above -- I'll do that tomorrow

SteveMacenski commented 3 years ago

Well I think we've found from this method, we don't require cleared_cells at all anymore, yeah? If we don't include them in Populate... then we're not artificially having more than we should to need to clear them out?

siferati commented 3 years ago

we don't require cleared_cells at all anymore

We've already discussed this a few comments above:

wondering why any of this cleared_cells stuff is needed

Because we need to touch the cleared cells to update the bounds. Quoting your previous comment: the internal voxel grid is empty on clearing, then nothing new marks it because the obstacle instantly disappeared, and then when it comes time to update the bounds, there's nothing to update them with (and there's no static layer to set the reasonable update bounds).

There are 3 different problems at play here:

Cleared cells are not touched, so the corresponding area of the costmap isn't updated. The cleared_cells container solves this.
Cleared cells are still marked in the cosmap as obstacles. This cleared_point flag solves this.
Marking is 1 update cycle behind. I reverted this PR to not address this, as it is more complicated than initially assumed and should be handled in its own PR.

cleared_cells and cleared_point are not interchangeable and both solve different problems.

siferati commented 3 years ago

Have you tested this that it resolves the issues and doesn't break anything?

This PR fixes #199. I didn't find any problems when testing.

SteveMacenski commented 3 years ago

OK here's what I recommend for next steps:

Submit this patch to Noetic, Foxy, and ROS2 branches so that the newest distributions can get these patches as well
I'm going to reach out to some users of STVL in production to test this patch to make sure they don't run into any issues. I'd test myself, but I don't use this anymore in a product (I've since left that company and I'm not working with product robots anymore) so I want to make sure that others that this more directly impacts have a say
And assuming all is well, we can merge!

@doisyg can you test this either in ROS1 or ROS2 (once a PR for ROS2 is made)? Just to sanity check this doesn't break anything for you?

siferati commented 3 years ago

Submit this patch to Noetic, Foxy, and ROS2 branches

Done. But I don't have access to a test environment for those distributions, so I wasn't able to test if things actually worked -- I'll leave that to someone else.

doisyg commented 3 years ago

I may have some bandwith to do a RO2 sanity check on our hw platform (which uses 2 3D cameras in STVL) at the end of the week. Please ping me back then if I don't

SteveMacenski commented 3 years ago

@doisyg friendly ping :-)

Patrick-Lascombe commented 3 years ago

I tested the PR on our ROS2 hardware, it solves the issue without impacting the behaviour of the STVL

SteveMacenski commented 3 years ago

Sweet, merging!

SteveMacenski / spatio_temporal_voxel_layer

fixes #199 #200

I finally found the problem!