PRBonn / semantic-kitti-api

SemanticKITTI API for visualizing dataset, processing data, and evaluating results.
http://semantic-kitti.org
MIT License
767 stars 186 forks source link

Multiple Scan Experiments #66

Closed anastasiia-kornilova closed 3 years ago

anastasiia-kornilova commented 3 years ago

Hi, thank you for a great job!

In the paper, it is said that "We exploit the sequential information by combining 5 scans into a single, large point cloud". My question is -- what is meant by combining: a. put them in one point cloud without any transformations (that gives a dense but distorted map) b. put them in one point cloud applying transformations for every point cloud in accordance with transformations from some odometry algorithm (that gives a dense map with distortions only in moving objects)?

jbehley commented 3 years ago

It's option b and we use the poses provided with the data, estimated with our SLAM approach, SuMa.

anastasiia-kornilova commented 3 years ago

@jbehley , thank you a lot! That solves my question.

Another question is a more philosophical one -- it is strange that methods, based on the denser well-aggregated point cloud, work worse than methods based on one observation. Is that because this aggregation gives a lot of distortions for projection-based methods, especially, on the scenes with high-dynamic objects? Maybe you observe some effects on this, I would highly appreciate your comments if you have, because I didn't find thoughts on this in the paper.

jbehley commented 3 years ago

Please note that we require methods to additionally distinguish between moving and non-moving objects in the benchmark in the multi-scan case. Therefore, there are more classes and distinguishing moving and non-moving objects is non-trivial even though you have multiple time steps, since it required something like data association between scans to figure out that a car at one location at time t is actually the same car at time t-1.

However, in the way we used the aggregated point clouds, we see exactly the effects you described: cars moving in front of the scene and basically hiding all points behind the car. Thus, I would not suggest in hindsight to simply take only a single projection of the aggregated point cloud.

Lastly, I would agree that for static objects it should make the task easier if one is able to use the aggregated information without too much downsampling.

Hope that helps to answer your question.

anastasiia-kornilova commented 3 years ago

@jbehley , yes, thank you for such a detailed explanation, now I understand the whole picture. Wish you good luck in research activities, your lab is doing really great work!