What parameters you used in Erasor module?

PRBonn / auto-mos

Automatic Labeling to Generate Training Data for Online LiDAR-based Moving Object Segmentation

MIT License

91 stars 8 forks source link

What parameters you used in Erasor module? #3

Open vacany opened 2 years ago

vacany commented 2 years ago

Thank you for your work on Lidar motion segmentation! I worked past four weeks on Implementation of Erasor and could not produce the results of Auto-Mos on SemanticKitti. What is the "Default" parameters to Erasor you used for the sequences as reported in your paper? The original repo reported different ones for each sequence. Is it a large_scale.yaml?

For me it was able to produce results on sequence parts from their paper but on other sequences, the output static map is extremely noisy and not usable to distinguish the moving vehicles. Did you need to change something in it?

Thank you very much for answering any of the questions and helping fellow researcher in his desperate time.

patripfr commented 2 years ago

I am facing the same issue, I can also not reproduce the results with the given instructions. I suspect the issue is either in the criterion I use to get the coarse dynamic points from ERASOR or in how I use HDBSCAN, which is both not explained unfortunately. See attached my results on sequence 5, using the parameters they used in the ERASOR repo. There's clearly a large amount of static points detected as dynamic. Any insight would be appreciated.

seq5

Chen-Xieyuanli commented 2 years ago

Hey @vacany @patripfr, to make ERASOR work, you may notice two things. First, It detects moving objects upon their ground removal algorithm, and you can check whether the ground removal results are good or not. For more information, you can find it here: https://github.com/url-kaist/Ground-Segmentation-Benchmark. You could also directly contact Hyungtae Lim, who is a very nice guy. Second, it checks the consistency between the current observation with the prebuilt submap. The larger the submap, the better you can detect the dynamics. In our case, we use 200 frames to build the submap with the voxel size of 0.2. It of course will detect more false positives if you use more aggressive consistency checking parameters. We use the ERASOR results as proposals and will use clustering and tracking to eliminate the false positives.

For HDBSCAN, we apply over varying density thresholds and integrate the result yielding a clustering that gives the best score. To be more specific, we conduct DBSCAN multiple times using different density thresholds on point clouds without ground and then calculate the score of each instance using the ERASOR proposals. The score of an instance can be calculated as the ratio of dynamic points found by ERASOR in that instance.

As we replied previously, we are currently still working on our LiDAR moving object segmentation benchmark, and therefore we will not release the full pipeline of our auto-mos soon. We also encourage users not to generate ground truth labels on our test set, which will kill the benchmark.

We thank you for your understanding!

patripfr commented 2 years ago

Hi @Chen-Xieyuanli, thanks a lot for your answer, this makes things much clearer already. I have a few follow up questions:

When you say you use 200 frames, what does that mean exactly? Are you splitting every dataset into multiple sequences of 200 frames or are you selecting 200 frames equally spaced per dataset or something else?
Given the ERASOR map, how do you find the proposal points? Is it all points that have no neighbor in the map closer than the voxel diagonal?
Am I understanding it correctly that the clustering is performed on the whole point cloud with ground points removed? In the paper it sounds / looks like the clustering is only performed on the dynamic proposal points
For the clustering, are you "manually" running DBSCAN multiple times instead of using the HDBSCAN implementation you've linked in the README?

Thanks, Patrick

Chen-Xieyuanli commented 2 years ago

Hey @patripfr, please find my answer below:

When you say you use 200 frames, what does that mean exactly? Are you splitting every dataset into multiple sequences of 200 frames or are you selecting 200 frames equally spaced per dataset or something else?

As introduced in ERASOR, we need to first generate the submap and then detect the dynamics. In our case, we want to find the dynamic objects in all scans, thus need to build the map of the whole sequence. However, it is too large to store the whole voxel map in the memory of the whole sequence. We divide it into submaps and conduct ERASOR on each submap for the dynamic proposals.

Given the ERASOR map, how do you find the proposal points? Is it all points that have no neighbor in the map closer than the voxel diagonal?

Once we know which voxel is dynamic, we label the points inside that voxel as dynamic.

Am I understanding it correctly that the clustering is performed on the whole point cloud with ground points removed? In the paper it sounds / looks like the clustering is only performed on the dynamic proposal points

We get the final instance segmentation results based on the dynamic proposal scores, while the instances are segmented using the DBSCAN from the point cloud with ground points removed.

For the clustering, are you "manually" running DBSCAN multiple times instead of using the HDBSCAN implementation you've linked in the README?

We followed the linked HDBSCAN method and modified it to fit our method by incorporating the dynamic proposals.

patripfr commented 2 years ago

Hi @Chen-Xieyuanli, sorry to bother you again, but unfortunately I'm still struggling with understanding the clustering part. I'm a bit confused about how you determine what is the best dynamic proposal score instance? Is it calculated per point? Let's say we have two dynamic proposal points, a and b. For point a, we find the best score with cluster 1. For point b, we find the best score with cluster 2, but cluster 2 also includes point a. Since every point should only be in one instance finally (according to the paper), how would you solve this case? Maybe I'm also completely misunderstanding something here...

Chen-Xieyuanli commented 2 years ago

Hi @Chen-Xieyuanli, sorry to bother you again, but unfortunately I'm still struggling with understanding the clustering part. I'm a bit confused about how you determine what is the best dynamic proposal score instance? Is it calculated per point? Let's say we have two dynamic proposal points, a and b. For point a, we find the best score with cluster 1. For point b, we find the best score with cluster 2, but cluster 2 also includes point a. Since every point should only be in one instance finally (according to the paper), how would you solve this case? Maybe I'm also completely misunderstanding something here...

Yes, in the end, one point can only be assigned with one instance ID, but not every point will be assigned with an ID since there are outliers, which is one of the good features of DBSCAN.

The problem you mentioned is exactly what the HDBSCAN aims to solve. You may know that DBSCAN is a flat density-based method to cluster data, while HDBSCAN is a hierarchical-based method. HDBSCAN builds a hierarchy of connected components using tree structures. We can understand the difference between the DBSCAN and HDBSCAN is that the DBSCAN gets a set of flat clusters by cutting the tree using one threshold. In contrast, HDBSCAN uses multiple thresholds, thus obtaining instances at different density levels. We follow exactly the core concept of HDBSCAN and obtain dynamic proposals to varying levels by comparing the scores, which is the average ratio of the dynamic points in the instances. If the score of the root/larger instance is the same as its leave/smaller instance, the root/larger one will be kept.

tbigi1 commented 10 months ago

Hi @Chen-Xieyuanli, at first I want to congratulate with you for your excellent work. I'm struggling to retrieve the labels for all points of the original point cloud after obtaining the dynamic proposals from the voxelized global map. In other words, I found the dynamic points of each voxelized scan but I can't obtain the dynamic labels for the original non voxelized scan. I tried to implement a neighborhood search but it doesn't work because it returns too few dynamic points. Do you have any suggestion for my problem? Thanks in advance!