Algorithm improvements to KD-Tree module: applications to ICP (Pranjal Kumar Rai)

jlblancoc commented 7 years ago

Initial description

Nanoflann is a C++ library provided by MRPT for building KD-Trees. The library is a fork of the widely-used FLANN C++ library and is mostly optimized for lower dimensional point clouds. Most ICP algorithms for robot SLAM and localization require finding nearest neighbor in 2D/3D point clouds. Support for dynamic point clouds and nearest neighbour query in non flat spaces are two paramount features which are missing in the current nanoflann library.

See attached PDF.

5589797390254080_1491229396_MRPT_GSOC_Proposal.pdf

Final results:

nanoflann PR 48: Pull request with all code contributions by @pranjal-rai for GSoC2017

jlblancoc commented 7 years ago

Hi @pranjal-rai !

After a more careful inspection of the new code, here are some new recommendations:

For consistency, I think it'll be better to do this renaming:
- metric_InnerProdQuat => metric_SO3_InnerProdQuat
- metric_acosInnerProdQuat => metric_SO3_acosInnerProdQuat
- (idem for the *_Adaptor classes)
I noticed that we still miss a metric for SO2 (!). Probably because you focused on the much harder 3D problem, but 2D is still really useful to have. Please, create one new metric for this case, with a name like: metric_SO2. No need to specify the name "postfix" of the metric, we can simply use a formula like this one, but without the previous "unwrap". Docs should mention that it's the user responsibility to provide us angles in the range [-pi,pi].

jlblancoc commented 7 years ago

At some point things should have gone out of control (not your fault, the code was already like that before GSoC!): you may notice that there are a set of structs named metric_L1, metric_L2,... which are NOT used at all... Instead, examples directly use as second templ. argument the *_Adaptor<num_t,...> type, which is long to type and obscures the declarations.

Please, change all examples to use the trait types metric_* as a much simpler second template argument. Inside KDTreeSingleIndexAdaptor, naturally, change the name of that template name from typename Distance to something like typename MetricTraits. Then, add inside the class:

typedef MetricTraits::traits<T,DataSource> Distance;

(note that's not the exact line, please refine it to make it work as expected). The idea is that "Distance" will be infered by the compiler with its proper arguments, instead of leaving that work to the user. Let me know if I should explain this point differently or if you got the general idea...

Please, create a new (empty) struct as a base class for all metric_* structs, so the are listed as children of the common base one in Doxygen docs, e.g:

struct Metric
{
};
...
struct metric_L1 : public Metric 
{
...
};

Also, please, update all comments like:

* \tparam Distance The distance metric to use: nanoflann::metric_L1, nanoflann::metric_L2, nanoflann::metric_L2_Simple, etc.

so the list of metric structs are not directly listed (because it's easy to forget updating all those comments). Instead, change them to something like "Existing metrics are all classes derived from nanoflann::Metric".

jlblancoc commented 7 years ago

According to the docs in nanoflann.hpp:

     *   // [Only if using the metric_L2_Simple type] Must return the Euclidean (L2) distance between the vector "p1[0:size-1]" and the data point with index "idx_p2" stored in the class:
     *   inline DistanceType kdtree_distance(const T *p1, const size_t idx_p2,size_t size) const { ... }

So, please, refactor the existing metric classes (including the new SO2 & SO3; forget about SE2/3 for now!) such that they are able to work without kdtree_distance(); even more, let's get rid of kdtree_distance(), since it seems to cause confusion:

Remove the mention to "kdtree_distance" in the nanoflann.hpp docs about the expected DatasetAdaptor API.
Remove kdtree_distance() of all your new metric classes.
Replace its call in metric_L2_Simple with a simple loop that iterates over the point dimensions using calls to kdtree_get_pt().

I admit that the operator() notation may be not the best, clearest way to invoke the metric... What do you think about changing the function from operator () to a regular, named method, e.g.
- DistanceType operator()(...) ==> DistanceType evalMetric(...)
- And its invokation: distance(vec, index,...) ==> distance.evalMetric(vec,index,...)

I know... a lot of changes and we're so close to the end of GSoC! But the errors in your new metric classes made me realize of how confusing the current code/API/method-names might be, so improving the overall library consistency is, IMO, a priority above adding new features.

Let's try to end all the points I mentioned above, and polish the code for SO(2) and SO(3). Then, we could finish the implementation with std::tuple for more complex spaces if in time, or otherwise, finish it after GSoC, if you still feel inclined to give us a hand with the project ;-)

pranjal-rai commented 7 years ago

I have removed the requirement of function kdtree_distance here. I still need to update it for new metric classes.

pranjal-rai commented 7 years ago

I have completed the following tasks :

Remove the mention to "kdtree_distance" in the nanoflann.hpp docs about the expected DatasetAdaptor API.

Remove kdtree_distance() of all your new metric classes.

Replace its call in metric_L2_Simple with a simple loop that iterates over the point dimensions using calls to kdtree_get_pt().

I admit that the operator() notation may be not the best, clearest way to invoke the metric... What do you think about changing the function from operator () to a regular, named method, e.g.

DistanceType operator()(...) ==> DistanceType evalMetric(...) And its invokation: distance(vec, index,...) ==> distance.evalMetric(vec,index,...)

Related commits: 1, 2, 3

pranjal-rai commented 7 years ago

I have added the support for SO2 here.

Now the task which remains is -

change all examples to use the trait types metric_* as a much simpler second template argument.

I will start working on this now.

jlblancoc commented 7 years ago

Hi @pranjal-rai !

I reviewed all your commits so far, and they all look perfect :+1:

One additional remaining issue: I can't right now (off the top of my head) recall in what context the KD-tree code calls to accum_dist(). But it seems to be wrong in many "Metric" structs (including the new ones for SO2,SO3), since it simply adds squares of differences (!). Have you already thought on how to handle this?

Cheers,

pranjal-rai commented 7 years ago

Hi @jlblancoc

But it seems to be wrong in many "Metric" structs (including the new ones for SO2,SO3), since it simply adds squares of differences (!)

Yes, I still need to update accum_dist() for our new metrics. I guess the function is correct for the existing metrics because it has been directly adapted from flann. I found something about accum_dist() here. But I do not know how do we modify this function to suit our new metrics. Right now I do not have any idea on this. I was not able to find any other helpful resource related to it.

pranjal-rai commented 7 years ago

From what I found, the distance metric has to be dimension wise additive, to implement accum_dist() function, but this does not hold for our new metrics. (Source)

Does this mean we will have to choose some other metric?

jlblancoc commented 7 years ago

Ok, I expected this... don't have time until tonight to look at this, but the best solution would be making up a way to refactor the places where the accum function is called to use the other, full metric eval().

Try to give it a think... or two ;-)... and let me know what you find by this night.

Best

jlblancoc commented 7 years ago

Hi @pranjal-rai ! I just added a couple of PRs to your fork, take a look at them.

Also, I noticed that the code refactoring to move all shared code down to the base class KDTreeBaseClass could be even more efficient: these are duplicated items in the derived classes (Single vs. Dynamic) that can be (if I don't miss something important?) moved to the base class:

std::vector<IndexType> vind;
size_t m_leaf_max_size;
DatasetAdaptor &dataset;
m_size, m_size_at_index_build, dim, ...
NodePtr root_node;
BoundingBox root_bbox;
PooledAllocator pool;
Distance distance; (which, by the way, could be renamed to a better name like metric)

In general, all functions (init_vind(), etc. etc.) and variables shared by the two children classes should be moved to the base to avoid maintenance becoming a hell, and to be efficient ;-)

jlblancoc commented 7 years ago

One fundamental thing that seems to be missing in your contributions so far is this: we have a great benchmark to test the computational efficiency of different metrics / algorithms; that's all good. But there are no unit tests that verify that the output of those methods is correct (!!).

Please, consider adding new unit tests to test_main.cpp for (see existing code there and use them as a template):

Comparing the nearest point according to KD-tree queries for:

SO2
SO3 (with SO3_InnerProdQuat_Adaptor only)
L2 with the dynamic algorithm.

against the output of the brute force algorithm (so, please use a relatively small number of points). As you can see, the current code checks this a number of times for random datasets to assess that the kd-tree always outputs the correct answer, for the L2 metric.

Duplicated code like definitions of auxiliary dataset classes, functions to generate a random point cloud (of the different kinds: XYZ, Quaternion,...) could be better refactored from their current examples/*.cpp files into a new examples/utils.h and examples/utils.cpp files, so those files could be #included in both, examples, and the unit tests. This can be done, naturally, adding those files to the CMakeLists.txt files.

Again, avoid code duplication at all cost!! It's evil! :smiling_imp:

jlblancoc commented 7 years ago

Again on code duplication: It seems to me that SO2_Adaptor::evalMetric() can be rewritten as a call to SO2_Adaptor::accum_dist(), right? (or the other way around)

jlblancoc commented 7 years ago

Now, regarding the topic of accum_dist(): the good news is that, unless I missed something, it seems we have a solution, but it will have a cost...

The point is that metrics must be able to be obtained element-wise for the kd-tree to work. You can see this in the last argument idx of:

inline DistanceType accum_dist(const U a, const V b, int idx) const

In short: summing accum_dist(a[i],b[i],i) for i=0...dim-1 should be equal to evalMetric(a,b,dim) for all metrics.

So, this unfortunately means that the "acos()" metric must be ruled out, since I can hardly think of a way to compute it element-wise...

For L1, L2, SO(2) the implementation of accum_dist() seems obvious.

For SO(3)-InnerProdQuat: I thought of this solution:

Impose the user to always use positive (qx,qy,qz) and reflect the rotation sense via the sign of qr. This means adding a comment about this constraint in the doxygen docs; we could add a "for()" checking for this condition and launching an exception if data are not compliant, but that could have a large runtime cost, so just let's forget about it for now...

Let's see the formula:

    inline DistanceType evalMetric(const T* a, const size_t b_idx, size_t size) const {
        DistanceType result = DistanceType();
        for (int i=0; i<size; ++i) {
            const DistanceType diff = a[i] * data_source.kdtree_get_pt(b_idx, i);
            result += diff;
        }
        result = 1 - std::abs(result);

The accum_dist() for idx=0 could include the 1-std::abs(diff[0]), then for 1:3 we could now assume that all numbers will be positive, so each accum_dist() should return -a[i]*b[i] (or alike). Please, give it a thought... it makes sense to me.

As a replacement for the acos-innerProduct metric, we could implement the Frobenius norm:

frobenius_norm

Please, unroll the expression (with paper and pencil... or with a symbolic package of Matlab, Mathematica or any other software) for the trace(R^t * Ri) you will see it's the sum of 3 terms, each of them a function of the 4 quaternion components. I think that this metric is suitable for being added up element by element, although some refactoring might be needed in the function parameters to pass not only the (a[i],b[i]) elements, but the entire vectors (a,b). Please, give it a thought too...

PS: If time permits, please take note of this task that would be another great add-on: add more unit tests, one per metric class (L1,L2,SO2,...) and that, instead of building a complex dataset, etc. it just verifies that the sum of accum_dist() is equal to evalMetric(), as expected, for some random "points". For this, use gtest's EXPECT_NEAR().

PS2: Ask in case you have doubts... which would be more than understandable at this point! ;-)

jbriales commented 7 years ago

Hi everyone! I must say I've been following this thread for the last days, curious on the accum_dist( ) issue and how you'd solve it. So far I didn't have any straight comment to do (specially since I don't understand too well the inner working of the KD-Tree and what's the role of accum_dist( )), but regarding the last suggestion of @jlblancoc in using the Frobenius norm: I think the actual relation is

$||R-R_i||_F^2 = 2\; tr(I) - 2\; tr(R^\top R_i) = 6 - 2\; tr(R^\top R_i)$

Then I wonder, must the distance be an actual distance (positive)? I ask this because if you use $tr(R^\top R_i)$ alone this value can go from -3 to +3. A simple solution if this is an issue would be then to take each term in $tr(I_3 - R^\top R_i)$ instead to have a distance that goes from 0 to +6.

Best, Jesus

pranjal-rai commented 7 years ago

HI @jlblancoc

It seems to me that SO2_Adaptor::evalMetric() can be rewritten as a call to SO2_Adaptor::accum_dist()

Fixed

Now moving on to SO(3)-InnerProdQuat

The accum_dist() for idx=0 could include the 1-std::abs(diff[0]), then for 1:3 we could now assume that all numbers will be positive, so each accum_dist() should return -a[i]*b[i] (or alike). Please, give it a thought... it makes sense to me.

I think there is a problem in this solution. Consider two quaternions (w1, x1, y1, z1) and (w2, x2, y2, z2). Now as per our metric, dis = 1 - abs(w1 * w2 + x1 * x2 + y1 * y2 + z1 * z2) Let us assume that x, y, z are positive for both quaternions. Now dis is not necessarily equal to 1 - abs(w1 * w2) + x1 * x2 + y1 * y2 + z1 * z2 because even though the terms involving x, y and z are positive they cannot be taken outside abs().

Example: q1 = (-.075, 0.12, 0.64, 0.07), q2 = (0.03, 0.80, 0.18, 0.56) In this case eval_metric() gives 0.77 while summation of accum_dist() as per the proposed solution gives 0.72

Please correct me if I am wrong.

pranjal-rai commented 7 years ago

I have added accum_dist() for SO(3)-InnerProdQuat, in case you need to test it.

pranjal-rai commented 7 years ago

Also, I noticed that the code refactoring to move all shared code down to the base class KDTreeBaseClass could be even more efficient: these are duplicated items in the derived classes (Single vs. Dynamic) that can be (if I don't miss something important?) moved to the base class

Fixed

jlblancoc commented 7 years ago

@jbriales thanks a lot! :+1: Yes, in fact I realize now that we need distances to be "good metrics", so they can't be negative...

@pranjal-rai :

Example: q1 = (-.075, 0.12, 0.64, 0.07), q2 = (0.03, 0.80, 0.18, 0.56)

I noticed your quaternions are not unitary (|q1| = 0.659 ??) and think I found a bug in your code generation for random quaternions, please check my comment on your commit.

Giving it a second thought... I spoke a senseless thing yesterday: we CAN'T force (qx,qy,qz) to always be positive! It's the axis of rotation, we must cover quadrants with negative values no matter qw (the first quaternion element) is allowed to be negative or not.

This means that, apparently, there is no straighforward valid SO3 metric that can be built up element-by-element, and still complying with the kd-tree algorithm of assuming that adding the distance increment for one dimension always make distances to increase.

There is a final, valid alternative for SO(3): using the L2 (square norm) of the unit quaternion itself. That is, working with the quaternion as if it was a common 4-length vector in Euclidean space. This implies enforcing qw to be always positive, so there is no chances of having the same rotation described by different quaternions.

Please, rename and rewrite the SO3 metric so it works like this (yes: it would be identical to L2 for now!!). We could think of something better in the future... Also, make sure of changing the generation of random quaternions such that the (x,y,z) part can be positive or negative (each component independently) while (qw) is always positive (or zero).

If you are done with this, then the next most important thing to do before the end of GSoC would be IMO the unit tests I mentioned above (including the new tests, and the refactoring of common code among examples and tests).

Keep pushing! ;-)

pranjal-rai commented 7 years ago

Hi @jlblancoc

I noticed your quaternions are not unitary (|q1| = 0.659 ??)

Actually I made a typo in comments, w in q1 is actually -0.75 not -0.075. Sorry. I referred this resource (page 3 - Algorithm 2) to generate unit quaternions.

I have now updated the quaternion generation code to use the approach you suggested. Right now I am using any random angle between [0, PI] so that w remains positive and x, y, z can be of any sign. I ensured that I am generating unit quaternions. I have added the support for SO3_adatptor using L2_Simple_adaptor and removed SO3_InnerProdQuat. Related Commits: 1, 2

I will now start working on following tasks.

Write Unit Tests (Earlier, I wrote some tests locally to test my implementation, but did not follow the template. I will now read the code in tests/ and write tests accordingly.)
Remove duplicate codes
Further I will go through our previous conversation so that we do not miss any discussed task.

jlblancoc commented 7 years ago

Great! Come back if you have any further doubt.

pranjal-rai commented 7 years ago

Hi @jlblancoc

I have added unit tests to validate the following:

I will now start working on removing the duplicate codes.

pranjal-rai commented 7 years ago

Duplicated code like definitions of auxiliary dataset classes, functions to generate a random point cloud (of the different kinds: XYZ, Quaternion,...) could be better refactored from their current examples/*.cpp files into a new examples/utils.h and examples/utils.cpp files, so those files could be #included in both, examples, and the unit tests.

Fixed

pranjal-rai commented 7 years ago

I have added unit tests and also removed duplicate codes from examples and unit tests. I will clean and document all the codes by tomorrow. What should I do next? What is the process of submission?

pranjal-rai commented 7 years ago

Hi @jlblancoc

I have cleaned the codes and moved some common functions to base class. Related commits: 1, 2

What do we do now?

jlblancoc commented 7 years ago

Great work! I'll review the code changes later today.

In the meanwhile, you can merge all the changes in your master branch, and open a pull request against the original repo.

pranjal-rai commented 7 years ago

Hi @jlblancoc

I have created a PR but building flann requires administrative permissions and the tests fail because of this reason. What should I do for this?

pranjal-rai commented 7 years ago

I fixed the previous issue. All other tests except this complete successfully now.

jlblancoc commented 7 years ago

Good!

I think it's a problem of the external project, so let's ignore it. To make travis-ci clean, please add an "if" like this one to the travis.sh function build() to add an argument to CMake to instruct it NOT to build the external projects (hence, neither the benchmarking app). I mean, something like:

  cmake $SRC_DIR -DBUILD_BENCHMARKING=OFF

Obviously, you should add some "if()"s as well inside the CMake scripts to detect that new CMake variable and skip the external projects and the benchmarking app if it's set.

pranjal-rai commented 7 years ago

Hi @jlblancoc

Thanks for the suggestion. Right now all the tests pass without any fail.

pranjal-rai commented 7 years ago

Hi @jlblancoc

Can you please add my contribution to the initial description of the project? I need to fill it in the final evaluations.

jlblancoc commented 7 years ago

Done! 👍

Thanks for all the hard work during GSoC! We mentors have some more days for filling our evaluations, so expect more comments and/or feedback as the code is tested. Cheers

jlblancoc commented 7 years ago

BTW: You can also create a gist (see for example this one) as a summary of your work, so you can be sure the content "belongs" to you and you can edit it at any time in the future...

It will be OK for me to use either this page, or a custom gist (in that case, please post a link to it here so we can quickly review it).

pranjal-rai commented 7 years ago

Hi @jlblancoc

I have created a report as you suggested. Please review it. I am adding this link in the evaluation for now.

jlblancoc commented 7 years ago

Good! If you are still in time (I think it's late?) you could use that report instead, since you can directly edit it at any time.

Anyway, feel free to come back and claim any change in the first comment's "summary of work"!

On Tue, Aug 29, 2017 at 2:27 PM, Pranjal Rai notifications@github.com wrote:

Hi @jlblancoc https://github.com/jlblancoc

I have created a report https://gist.github.com/pranjal-rai/307e178d023b9a4b5455ca9ca8383344 as you suggested. Please review it. I am adding this link in the evaluation for now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MRPT/GSoC2017-discussions/issues/1#issuecomment-325648180, or mute the thread https://github.com/notifications/unsubscribe-auth/AFPj2ksBMRChnL4X6YatGxuKldhMZm5mks5sdAPIgaJpZM4NRWSn .

--

Jose Luis Blanco-Claraco Universidad de Almería - Departamento de Ingeniería https://w3.ual.es/~jlblanco/ https://github.com/jlblancoc

jlblancoc commented 7 years ago

Hi @pranjal-rai ,

It was a pleasure to have you in GSOC-2017! :+1:

I'm now closing this issue ticket, since the changes were already merged upstream. New issues or ideas discussion could go on either here, or in the nanoflann repo, ok? I still have to do a more careful test of your recent changes, so expect feedback in the next week or so.

Cheers!

pranjal-rai commented 7 years ago

Hi @jlblancoc

I'm now closing this issue ticket, since the changes were already merged upstream. New issues or ideas discussion could go on either here, or in the nanoflann repo, ok?

:+1:

It was a pleasure to have you in GSOC-2017!

Thank you very much.

Thank you being an awesome mentor throughout and giving me the opportunity to work on this project. I learned a lot from you while working on this project. I would love to contribute to MRPT in future too.

MRPT / GSoC2017-discussions

Algorithm improvements to KD-Tree module: applications to ICP (Pranjal Kumar Rai) #1

Initial description

Final results: