ctu-mrs / RMS

Code for RA-L paper "RMS: Redundancy-Minimizing Point Cloud Sampling for Real-Time Pose Estimation"
MIT License
116 stars 3 forks source link

Questions on paper #1

Closed cfneuhaus closed 2 weeks ago

cfneuhaus commented 10 months ago

Hey, I have some small questions on the paper:

Thank you!

petrapa6 commented 10 months ago

Hey Frank, thanks for these very good questions!

Thanks for reading the paper. You've helped in discovering some paper bugs :) Let me know if something is still unclear, I'll be happy to answer.

cfneuhaus commented 9 months ago

Hey Pavel,

thank you for your detailed reply. Also happy new year to you! Sorry, I haven't been able to reply earlier due to the holidays. Hope you enjoyed yours as well.

  • You've got it right from what is in the preprint [...]

Thank you for the clarification, however, I still do not quite understand how to compute it in practice. Eq. 35 is confusing for me. It iterates over all non-empty subsets of Omega. So given 10 bins, it would iterate over: {1}, {1,2}, bin {1,2,3},...,{2},{2,3},{2,3,4}, ... Is that correct? And still assuming there is at least one point within each bin of H_Delta, wouldn't this always result in the exact same numeric value? Can you provide some example pseudo code on how to compute this?

  • Each bin contains all points lying in the given GFH-norm interval. [...]

Technically, each individual point within a bin has an individual floating point GFH value, so the chances of two points having exactly the same GFH values are basically zero. For example, in bin 0.1-0.2, there could be points with GFH [0.123, 0.1230001, 0.15, 0.18]. Two values might be really similar, but never identical. Thus, it seems that a secondary key would never matter here? Maybe you assume that all points within a common bin have approximately the same GFH value, and thus only consider the secondary key for the pop operation within a bin?

  • Correct, NN search is the most expensive part of the algorithm [...]

Alright. Now I also understand the sensor-specific computation of lambda_p... Basically, you use knowledge about the point distribution to make a more informed NN query to the KD tree.

  • The GFH of these points is 0. We do not discard these points; [...]

Ok. I could imagine them to be potentially valuable for very low resolution (far away) areas... But of course, a large number of noise points satisfies the same criterion, so maybe that counteracts potential positive effects...

  • No deskewing is included in our experiments --- we did not want an unnecessary feature to influence the comparison. [...]

Well "unnecessary" - most likely, yes, I agree ;-) But if you think about it more like in Deschauds approach for a second, where some certain points reduce uncertainty in certain DoF, then I could imagine that the question of doing this before or after (rough) deskewing could significantly(?) influence the estimated normal of a point, and the algorithm's perception of which DoF this point helps with. I think this could be the case for extremely aggressive motion only, and I am not sure if the same argument applies to your algorithm, since your algorithm is more local as you said, and basically throws away this "directionality" that a normal would have.

  • Honestly, this is the first time I'm seeing this work [...]

Thank you for your explanations. Indeed it would be interesting to see a comparison.

As for the intuition - I think what I was confused about is that you do not use the GFH vector that you mentioned, but only the norm. I simply guess that currently, with the use of the norm, enough points are being sampled such that enough DoF of the problem are covered well. This causes it to work even though there is no notion of directionality or location of the point in the sampling.

I believe there are 3 factors for the improved performance: 1) the methodology, which maintains the optimum, 2) the lowered comp. latency, which provides faster (and in real-time systems, also better) convergence, and 3) stable compression rate, which reduces the number of comp. peaks and thus reduces the chance of real-time convergence faults.

Just to make sure, because you mention "real-time convergence faults"... when computing APE, RPE etc and comparing sampling strategies... Do you give the algorithms a fixed compute budget or something? No, right? Then what is a real-time convergence fault for you? So say that runtime does not matter, you would still remain the lead for most datasets in the V40+ LOAM+KISS-ICP columns of Table III?

Thank you for your answers Frank

petrapa6 commented 9 months ago

Hey Frank, Happy New Year to you, too :)

Thank you for the clarification, however, I still do not quite understand how to compute it in practice. ...

The mathematical notation in Eq. 35 might be confusing, but in practice, the max rate is computed once all the bins are traversed exactly once. What the Alg does in practice is

H_old, H_new = 1D histogram, 1D histogram
rates = []
for i in range(K, 0, -1):
   H_old.bins[i].sort() # on ordering later
   pt = H_old.bins[i].pop()
   if pt:
      H_new.bins[i].insert(pt)
   rates.append( rate of H_new (Eq. 32) )
max rate = max(rates)

We know that the maximum entr. rate of H_new is given in the <1, K> interval. For a fixed K, the max rate value is the same as you say, but only if all bins are non-empty. If there are any empty bins, it changes. Since we want to normalize to <0, 1> with at least one distribution having norm. rate of 1, we compute this max rate on the run, from the data. This is to condition (Eq. 31) relative to the data distribution and not to "an ideal case." Example: Note 12  1  2024 at 121659 jpeg(1)

Technically, each individual point within a bin has an individual floating point GFH value, so the chances of two points having exactly the same GFH values are basically zero. ...

Yes, in practice, the secondary key rarely triggers. Due to inaccuracies/errors and roughness of the surfaces, two GFH values are seldom identical. Nevertheless, it can happen, and the secondary key is there for completeness. Within a single bin, the points have differing GFH values - no artificial discretization happens.

Ok. I could imagine them to be potentially valuable for very low resolution (far away) areas. ...

Yeah, you're right. Also, the faraway points are useful for constraining rotations. But only if they are accurate, which is usually not the case in projective devices (3D LiDARs), where error increases with distance from the origin. From what I've seen, the points where some local geometry can be deduced (i.e., the points with neighbors) are more important in the task. And yeah, on super noisy data (e.g., large dust clouds), the method will most probably not function. But that is the case for all noise-unaware sampling methods, IMO.

Well "unnecessary" - most likely, yes, I agree ;-) But if you think about it more like in Deschauds approach for a second ...

I have to say that the deskewing influence remains to be tested. I come from a UAV lab where we often omit deskewing to reduce latency and to lower comp. requirements. But from what I've seen in the KITTI dataset, there are plenty of aggressive rotations (in one axis only though) where the lack of deskewing could have had an effect. And if it did, it had the same effect on ours and compared methods (btw Rusinkiewicz's work uses per-DoF sampling similar to Deschaud's).

As for the intuition - I think what I was confused about is that you do not use the GFH vector that you mentioned, but only the norm.

But your intuition is correct. If you look into X-ICP, there's a parallel between normals and the residuals used in the optimization. In X-ICP for example, the sampling respects the parallelism between residuals and the DoFs.

I figured during the experimentation that the positives (having the directions) of using the 3D vector on the input do not outweigh the negatives (slower and non-trivial multi-directional and possibly combinatorial sampling). As you say, the technique samples enough points that the DoFs are covered --- the prioritization of edges and locally rough surfaces and a slight redundancy on hyperplanes cover the desired directions. You may think of the algorithm as an approx. 3D corner detector with redundancy on lines and planes that certainly samples points on all planes and lines in the data (and thus extracts the maximum information given some point-sampling limit). And if all hyperplanes are covered, do we really need to know how these are oriented? For this method, knowing that the hyperplanes are all sampled is enough.

Just to make sure, because you mention "real-time convergence faults" ...

When saying "real-time", we mean that the estimation computes before another data frame arrives (so for 10 Hz data, the real-time limit is 100 ms). We have this limit there, but it's soft -- we give the methods an unlimited time and just measure the delay (this would be an issue in online runs for both the compared methods, which exceeded the real-time limit quite a lot). But inside the iterative algorithms (LOAM, KISS-ICP), there is still a termination criterion (e.g., no. of iterations, inner timing loop) that might terminate before convergence. And if these termination criteria are triggered and the output estimate is inaccurate, we call this a convergence fault. You may see this in 4:49 where the number of points during the fast rotation change leads to a complex opt. problem, which overwhelms the estimation, which then fails.

So say that runtime does not matter, you would still remain the lead for most datasets in the V40+ LOAM+KISS-ICP columns of Table III?

Yes.

Thanks for asking :)

Pavel