PRBonn / vdbfusion

C++/Python Sparse Volumetric TSDF Fusion
MIT License
498 stars 54 forks source link

Any trial on ESDF and gradient computation #10

Closed gogojjh closed 2 years ago

gogojjh commented 2 years ago

Hi, thanks for your great work! From the SLAM perspective, it would be better if the residual and gradient can be computed from the map to support frame-to-mesh registration (as you did in the PUMA paper, this is another good work). Also, the generation of the ESDF can be used for robotic planning. Have you tried these two directions? Or could you please share some hints on how to achieve them? I will try to make some contributions ^v^. Thanks.

nachovizzo commented 2 years ago

Hello there, I'm glad you like this work.

From the SLAM perspective, it would be better if the residual and gradient can be computed from the map to support frame-to-mesh

I'm not sure if I totally understand this sentence but, I'm currently working in a SLAM system based on top of VDBFusion, meaning that poses will no longer be required for the system to work.

About ESDF, I never implemented it, and is very unlikely I do it, so if you would like to do it I will be more than happy to add the contribution with the proper acknowledgment on this repo. My vision of this project is that people can easily fork it and modify it, that's why I spent extra effort in trying to keep stuff as simple and short as possible. So getting started shouldn't be that problematic. In any case,, modifying stuff and hitting make should do all the magic, including the python bindings. The starting point to change the implementation is VDBVolume.h and VDBVolume.cpp. Which sum up roughly ~200 lines of source code.

Let me know how it goes!

gogojjh commented 2 years ago

Thanks for your reply.

The first question is asking about the residual and gradient calculation from the TSDF. Some existing works have explored the way that tracking cameras' motion directly from the SDF [1]. In Voxblox, such functions are supported (https://github.com/ethz-asl/voxblox/blob/c8066b04075d2fee509de295346b1c0b788c4f38/voxblox/src/alignment/icp.cc#L130). I am curious about how to implement them based on your framework.

And I agree that the VDBFusion is easy to use. VDBFusion also has superior performance to Voxblox (tested on my computer with the KITTI dataset, VDBFusion's integration costs 50ms, while Voxblox's integration costs 400ms). I will try to extend your work AvA.

I have two additional questions.

  1. I have read some papers and lots of works on RGB-D dense mapping and reconstruction turn on the spacing carve. In your paper, you have explained some mapping modules need it. Could please explain if spacing carve is necessarily needed for SLAM in middle-scale or large-scale scenarios (like the KITTI)?
  2. The second question is about the truncation distance. As you stated in the paper: ``Large truncation distances allow for better smoothing of noise effects of the sensor but make the reconstruction of thin surface challenging''. Could you please explain it?

I would appreciate if you could provide some hints. Thanks.

Reference

[1] Bylow, Erik, et al. "Real-time camera tracking and 3D reconstruction using signed distance functions." Robotics: Science and Systems. Vol. 2. 2013.

nachovizzo commented 2 years ago

Sorry for the late reply!

Space Carving

That's actually a good question, I was planning to add a full study on this topic on the VDBFusion paper, but we ran out of time. Basically, it depends on the task. The way VDBFusion handles this is quite naive. For middle-large scale SLAM systems, I guess the space-carving strategy is not scalable. Since the mapping module needs to allocate much more memory, just to store the information of "this voxel has been visible, but not occupied, therefore free". When you skip this space_carving flag, you basically store information only close to the surface (the sensor measurement) and you have literally no clue about what is in the middle. If you get the chance, you should install OpenVDB and inspect the resulting volumes with and without space carving. The overall runtime get's really slow, and the memory consumption gets tremendously affected. Just check the number of voxels occupied in the following example (first 2 seconds of the KITTI sequence 00)

Without space carving

image

With space carving

image

Although if you run marching cubes over those distance fields, and check the surface, there is not much difference:

Without space carving

ScreenCapture_2022-04-04-12-08-38

With space carving

Here you can only tell that this model was generated with space carving if you pay attention to the roof of the cars, that got "carved" over time

ScreenCapture_2022-04-04-12-08-34

Conclusion

If you can cope with dynamics (like using LiDAR-MOS for example) then I'd advise not to use space carving. Although this depends if you are doing some sort of path planning, etc, on the map representation.

2 Truncation distance

To explain this, I'd need a whiteboard :), but basically, the effect is the same as the space carving on the roof of the cars from the last picture. If the truncation distance is (let's say infinite), then is equivalent to using space carving and therefore updating each voxel at each iteration. Thin surfaces will provide low-density measurements, meaning that, on average, they will be smoothed out by the moving average of the TSDF mapping system.

I hope this replies to your comments, looking forward to seeing if the system can be adapted to ESDF!

swarmt commented 2 years ago

I would think that the gradient calculation is identical to Voxblox. See interpolator_inl.h

gogojjh commented 2 years ago

Thanks for your detailed explanation! I can share some recent progress with you and hope that they are useful:

  1. I found that openvdb supports gradient computation. I also implemented the gradient calculation originally implemented by Voxblox. OpenVDB's computation is more efficient. The sample code looks like this:
    
    #include <openvdb/math/Operators.h>
    #include <openvdb/openvdb.h>

bool Interpolator::isActiveVoxel(const openvdb::FloatGrid::Ptr &grid, const openvdb::Coord &voxel_index) const { auto grid_acc = grid->getAccessor(); return grid_acc.isValueOn(voxel_index); }

bool Interpolator::getGradientVDB(const openvdb::Vec3s &xyz, openvdb::Vec3s &grad) const { openvdb::Coord voxel_index = openvdb::Coord::round(gridsdf->worldToIndex(xyz)); if (!isActiveVoxel(gridsdf, voxel_index)) return false;

// Iterate over all 3D, and over negative and positive signs in central // difference. // [-1 0 0] [1 0 0] [0 -1 0] [0 1 0] [0 0 -1] [0 0 1] openvdb::Coord voxel_neigh; for (unsigned int i = 0u; i < 3u; ++i) { for (int sign = -1; sign <= 1; sign += 2) { openvdb::Coord offset; offset[i] = sign; voxel_neigh.x() = voxel_index.x() + offset.x(); voxel_neigh.y() = voxel_index.y() + offset.y(); voxel_neigh.z() = voxel_index.z() + offset.z(); if (!isActiveVoxel(gridsdf, voxel_neigh)) return false; } } auto grid_sdf_acc = gridsdf->getAccessor();

// center difference, 2nd order // you can modify openvdb::math::CD_2ND to select other types of gradient like finite difference openvdb::math::Vec3s isGrad = openvdb::math::ISGradient::result(grid_sdf_acc, voxelindex); openvdb::math::ScaleMap scalemap = openvdb::math::ScaleMap( openvdb::Vec3d(voxelsize, voxelsize, voxelsize)); grad = scalemap_.applyIJT(isGrad); return true; }


2. The space carving is very time-consuming since the original implementation needs to traverse each voxel one by one. OpenVDB provides a more efficient implementation using HDDA. An explanation can be checked: https://groups.google.com/g/openvdb-forum/c/EU9YJjwobC4?pli=1.
3. A work in ICRA2021 already implemented the EDT with the usage of OpenVDB: https://github.com/zhudelong/VDB-EDT. 

The integration of these works may need some time. I will put a request if I finished :)
nachovizzo commented 2 years ago

Nice!

  1. Yes, OpenVDB supports out-of-the-box gradient computation on the SDF and it's also quite efficient. You should also check out the OpenVDB::GradStencil
  2. HDDA is actually not useful in the SLAM context. I fall into this trap many times in the past :P. In general, one should be carefull with VDB literature. Their main applications is 3D modelling and 3D rendering, where the topology is defined a priori. In our context (SLAM/ 3D-Mapping) we have sensor data and we have no clue at all on how the topology of the SDF would look at the end of the integration step. This breaks most of the assumptions they typically do, and this also means that we can't use HDDA to integrate new measurements.... since we still have no topology created. In general keep in mind this: we "bake" the VDB's on the fly while they have VDB's pre-baked and they need to add some toppings :). This is the very same reason why I still haven't ported this library to CUDA, they recently released nanovdb but it's also used for fast rendering, where the topology of the VDB is given, meaning that we can't really bake our own on the fly. I expect they will loosen this assumption in the future.
  3. VDB-EDT was almost done in parallel with this, it's also interesting to look and also this other 2 works: vdb_mapping, basically an "Octomap" using VDB's, no SDF fields and Bonxai, basically a from-scratch re-implementation of VDB's but targetted for robotics applications.
gogojjh commented 2 years ago

Great to see your reply. You have much more experience on OpenVDB :+1:

And I have another question regarding mesh generation. OpenVDB also provides a class called VolumeToMesh to generate mesh elements. I have tried this function and found that this function is more efficient than the MarchingCubes implemented in your code. But results of the two methods are different --- in fact, it seems that the MarchingCubes can generate more accurate meshes. OpenVDB may generate some noisy meshes.

The code based on OpenVDB is like this:

#include <openvdb/tools/SignedFloodFill.h>

openvdb::tools::signedFloodFill(tsdf_volume.tsdf_->tree());
std::vector<openvdb::Vec3s> vertices;
std::vector<openvdb::Vec3I> triangles;
std::vector<openvdb::Vec4I> quads;
// isovalue: The crossing point of the VDB values that is considered the
// surface when converting to polygons.
// relaxDisorientedTriangles: toggle relaxing disoriented triangles during
// adaptive meshing
double isovalue = 0.0;
double adaptivity = 0.0;
bool relaxDisorientedTriangles = false;
openvdb::tools::volumeToMesh(*tsdf_volume.tsdf_, vertices, triangles, quads, isovalue,
                             adaptivity, relaxDisorientedTriangles);
std::vector<Eigen::Vector3d> ver_eigen;
std::vector<Eigen::Vector3i> tri_eigen;
ver_eigen.reserve(vertices.size());
tri_eigen.reserve(triangles.size());
for (size_t i = 0; i < vertices.size(); i++) {
  ver_eigen.emplace_back(vertices[i].x(), vertices[i].y(), vertices[i].z());
}
for (size_t i = 0; i < triangles.size(); i++) {
  tri_eigen.emplace_back(triangles[i].x(), triangles[i].y(),
                         triangles[i].z());
}

I consider that the issue may come from the signedFloodFill function (https://www.openvdb.org/documentation/doxygen/namespaceopenvdb_1_1v8__0_1_1tools.html#ab8c1921d688e3c65c71aac5405f155a9). As stated in this website: This method should only be used on closed, symmetric narrow-band level sets. This function cannot work for the SLAM context: the topology of the environment is changing (like your last answer).

Have you tried it? Would you mind sharing some lessons? Thanks.

nachovizzo commented 2 years ago

Hello again,

I actually did try to use the VolumeToMesh before using the plain marching cubes algorithm. Just because it was much more efficient. Indeed, this is something they've changed back in 2013. You don't need to write code to try this out though :) Just open the VDB visualizer vdb_view something.vdb and then press 2. This will run the triangulation algorithm and render it to the screen. There you see what it would look like.

Indeed the issues are strongly related to the signedFloodFill. This function works only for real "closed" surfaces, those we would never see in a SLAM context, as you mention, but not because of its dynamic nature, but because it will never be a closed surface (Unless you are using an RGB-D scanner to scan objects, which indeed, reassemble a closed surface)

On the other hand, I always wanted to parallelize to marching cubes implementation(ToDo) just to make it faster. About the "better" results of the marching cubes is probably what we acknowledge on the paper (see picture) and it's implemented here: https://github.com/PRBonn/vdbfusion/blob/e7eb198c9c800dc42029f68d3fd39a010b6d0e98/src/vdbfusion/vdbfusion/MarchingCubes.cpp#L63

image

I hope this helps :)

PS: At this point you might also could give a look at the vdb_to_numpy repo I released last week, although is a bit messy and not maintained.