PRBonn / vdbfusion

C++/Python Sparse Volumetric TSDF Fusion
MIT License
488 stars 54 forks source link

Comparison to Voxblox in paper #3

Closed Willyzw closed 2 years ago

Willyzw commented 2 years ago

Hallo, since I have been using voxblox for a while. So far I am pretty satisfied with the performance it can bring. Especially the speed of its fast integrator. So as for the runtime, the experiments in paper point out the VDB's runtime is even faster than voxblox. Since there are three integrator types in voxblox, i.e. simple, merged and fast and apparently it means a lot to runtime, I am wondering which integrator type specifically does the comparison use? Thanks in advance for your reply!

nachovizzo commented 2 years ago

Hello, Thanks for the question. This indeed is something that's is of my interest too.

The integrator we are using for comparison is the simple one. The idea was to compare 1 to 1 "standard" TSDF integrators. The outcome of this experiment is that, when applying similar ideas to the VDBFusion system, it's very likely to outperform Voxblox. The main difference is in the threading models, the fast and merged are multithreaded, and therefore faster than the single thread model used on VDBFusion.

Although one could also apply different threading models to VDBFusion(which is an open task), this doesn't have much meaning for real low-power robotics applications. Since mapping is not the only task that a mobile robot must carry on. So for me, it's still important to have decent performance on a single-core CPU, even when this means dropping some frames from the sensor.

On the other hand, you can try right away an incorrect multi-thread execution of the VDBFusion algorithm by just applying this patch:

diff --git a/src/vdbfusion/vdbfusion/VDBVolume.cpp b/src/vdbfusion/vdbfusion/VDBVolume.cpp
index e852a6c..1cef664 100644
--- a/src/vdbfusion/vdbfusion/VDBVolume.cpp
+++ b/src/vdbfusion/vdbfusion/VDBVolume.cpp
@@ -9,6 +9,7 @@
 #include <Eigen/Core>
 #include <algorithm>
 #include <cmath>
+#include <execution>
 #include <functional>
 #include <iostream>
 #include <memory>
@@ -85,7 +86,7 @@ void VDBVolume::Integrate(const std::vector<Eigen::Vector3d>& points,
     auto weights_acc = weights_->getUnsafeAccessor();

     // Launch an for_each execution, use std::execution::par to parallelize this region
-    std::for_each(points.cbegin(), points.cend(), [&](const auto& point) {
+    std::for_each(std::execution::par, points.cbegin(), points.cend(), [&](const auto& point) {
         // Get the direction from the sensor origin to the point and normalize it
         const Eigen::Vector3d direction = point - origin;
         openvdb::Vec3R dir(direction.x(), direction.y(), direction.z());

Keep in mind that this might not produce the correct results, since it will lead to race conditions when accessing different voxels.

PS: Have you tried the voxblox_pybind python bindings? I made those bindings to have ease of experimentation when trying out voxblox without having to convert all my datasets to rosbag files. The API is quite similar to the one I choose for VDBFusiuon, and maintains the original names for Voxblox:

from voxblox import SimpleTsdfIntegrator

self._tsdf_volume = SimpleTsdfIntegrator( self._config.voxel_size, self._config.sdf_trunc, self._config)
for scan, pose in dataset:
    self.tsdf_volume.integrate(scan, pose)
Willyzw commented 2 years ago

Thanks a lot for the quick but detailed reply! I can see now the improvement given the integrator type of voxblox and the constraint of only one thread. Indeed it is impressive how fast VDBFusion can run. Maybe as a followed up question, do you maybe plan for further optimization e.g. the similar ideas of merged and fast integrator. Although I am unsure if these ideas are only feasible for multi-thread execution.

Thank your for the sample code enabling the multi-thread execution of the VDBFusion, I will try it out when get time. As for the voxblox_pybind, I yet got attention to it. As a first glance, it already looks very promising. Thanks again for the generous release of these great libraries.

By the way, the artifacts produced by the voxblox look really strange. I didn't experience such artifacts so far even without using the ros wrapper. Is there maybe anything I can help with if you could add more details?

nachovizzo commented 2 years ago

If you find a way to solve the artifacts problems in the voxblox library I will be more than happy to merge it to the main branch and benefit the entire community. My guess is that is due to float-double conversions... For some reason, they are using float for its matrices instead of double... I know, memory, but a 4x4 matrix doesn't make much difference. As a matter of fact, this issue is also visible on the C++ API, so for a long time I thought I did something wrong with the bindings but it's not:

KittiDataloader loader;
Vector3dVector points;
Eigen::Matrix4f pose = Eigen::Matrix4f::Identity();

voxblox::TsdfIntegratorBase::Config config;
int voxels_per_side = 16;
config.default_truncation_distance = sdf_trunc;
config.min_ray_length_m = min_range;
config.max_ray_length_m = max_range;
config.voxel_carving_enabled = space_carving;
config.integrator_threads = 1;
voxblox::Layer<voxblox::TsdfVoxel> simple_layer(voxel_size, voxels_per_side);
voxblox::SimpleTsdfIntegrator tsdf_volume(config, &simple_layer);
for (auto _ : state) {
    for (int i = 0; i < loader.num_scans_; ++i) {
        loader.getScanAndPose(i, pose, points);
        auto [points_C, colors] = PointcloutToVoxblox(points);
        auto T_G_C = voxblox::Transformation(pose);
        tsdf_volume.integratePointCloud(T_G_C, points_C, colors);
    }
}

The open question is why is working on ROS.... I tried to backtrace the transformations but it was a long path and never found a solution. My intuition tells me that there is something wrong with the minkdir custom transformation library they use.

If you find a solution, please let me know!

Willyzw commented 2 years ago

Okay this is something out of my expectation.. But indeed I remember having also experiencing problem with minkdir. It was when converting quaternion to rotation matrix using minkdir and slight degraded reconstruction was noticed. But no such significant artifacts as you mentioned, and I thought it was because of the precision loss of floating points.

swarmt commented 2 years ago

Hi There! Great library!

Just wanted to add my thoughts on the speed of the fast integrator in Voxblox. You should find that even with a single thread, it is significantly faster than the simple integration method.

When a point cloud is integrated, rays are cast in reverse, point to origin and subsampled at N * VoxelSize. There is a map of voxel indices that tracks how many rays have passed through a voxel , when a voxel is cast through more than N (default=2) times the ray cast is abandoned.

I'll have a look at the code when I get a chance and see how hard this would be to implement.

Code snippet below is from Voxblox.

void FastTsdfIntegrator::integrateFunction(const Transformation& T_G_C,
                                           const Pointcloud& points_C,
                                           const Colors& colors,
                                           const bool freespace_points,
                                           ThreadSafeIndex* index_getter) {
  DCHECK(index_getter != nullptr);

  size_t point_idx;
  while (index_getter->getNextIndex(&point_idx) &&
         (std::chrono::duration_cast<std::chrono::microseconds>(
              std::chrono::steady_clock::now() - integration_start_time_)
              .count() < config_.max_integration_time_s * 1000000)) {
    const Point& point_C = points_C[point_idx];
    const Color& color = colors[point_idx];
    bool is_clearing;
    if (!isPointValid(point_C, freespace_points, &is_clearing)) {
      continue;
    }

    const Point origin = T_G_C.getPosition();
    const Point point_G = T_G_C * point_C;
    // Checks to see if another ray in this scan has already started 'close'
    // to this location. If it has then we skip ray casting this point. We
    // measure if a start location is 'close' to another points by inserting
    // the point into a set of voxels. This voxel set has a resolution
    // start_voxel_subsampling_factor times higher then the voxel size.
    GlobalIndex global_voxel_idx;
    global_voxel_idx = getGridIndexFromPoint<GlobalIndex>(
        point_G, config_.start_voxel_subsampling_factor * voxel_size_inv_);
    if (!start_voxel_approx_set_.replaceHash(global_voxel_idx)) {
      continue;
    }

    constexpr bool cast_from_origin = false;
    RayCaster ray_caster(origin, point_G, is_clearing,
                         config_.voxel_carving_enabled,
                         config_.max_ray_length_m, voxel_size_inv_,
                         config_.default_truncation_distance, cast_from_origin);

    int64_t consecutive_ray_collisions = 0;

    Block<TsdfVoxel>::Ptr block = nullptr;
    BlockIndex block_idx;
    while (ray_caster.nextRayIndex(&global_voxel_idx)) {
      // Check if the current voxel has been seen by any ray cast this scan.
      // If it has increment the consecutive_ray_collisions counter, otherwise
      // reset it. If the counter reaches a threshold we stop casting as the
      // ray is deemed to be contributing too little new information.
      if (!voxel_observed_approx_set_.replaceHash(global_voxel_idx)) {
        ++consecutive_ray_collisions;
      } else {
        consecutive_ray_collisions = 0;
      }
      if (consecutive_ray_collisions > config_.max_consecutive_ray_collisions) {
        break;
      }

      TsdfVoxel* voxel =
          allocateStorageAndGetVoxelPtr(global_voxel_idx, &block, &block_idx);

      const float weight = getVoxelWeight(point_C);

      updateTsdfVoxel(origin, point_G, global_voxel_idx, color, weight, voxel);
    }
  }