attcs / Octree

Octree/Quadtree/N-dimensional linear tree
MIT License
113 stars 13 forks source link

KNN not finding the nearest neighbour with small K #9

Closed LeweC closed 9 months ago

LeweC commented 9 months ago

As the title says, the KNN function does not return the nearest neighbour for me in many cases.

I noticed this when I created N number of random points and a random searchpoint, all within the boundaries of the octree. Then I manually calculated each euclidean distance for the points to the search point. The KNN function of the tree sometimes does not find the closest point. How often the first value was wrong depended on a few parameters like K or the number of points I created.

It happened for every TreePoint (tested with Quadtree, Octree and TreePointND<6>). The Quadtree needed more than 20 points to get the error, I am not sure if higher dimensional trees have the same or a higher threshold for points to get the error. I have mostly tested with 1000+ points for Octree and TreePointND<6>.

I have copied one of the many configurations below. It also includes some debug output.

I have included the specific points only to have a reproducible example, it is very easy to get the error with different points. I mostly created random points, checked whether the tree found the correct answer or not, and repeated the test function with new random points. With some configurations 16% of the trees gave wrong results. Here I simply added points manually that I found through random generation.

double euclidean_distance(double x1, double y1, double x2, double y2)
{
    return sqrt(pow(x2- x1, 2) + pow(y2 - y1, 2));
}

void test_setup()
{
    array<double, 2> point0 = {78.2619, 77.843};
    array<double, 2> point1 = {90.3005, 90.5172};
    array<double, 2> point2 = {69.8652, 12.2467};
    array<double, 2> point3 = {48.4675, 48.4948};
    array<double, 2> point4 = {36.3226, 68.4619};
    array<double, 2> point5 = {98.8799, 42.7149};
    array<double, 2> point6 = {31.412, 38.6866};
    array<double, 2> point7 = {63.2748, 77.0524};
    array<double, 2> point8 = {62.8844, 17.0536};
    array<double, 2> point9 = {80.8931, 39.8099};
    array<double, 2> point10 = {77.426, 64.9844};
    array<double, 2> point11 = {81.9552, 25.009};
    array<double, 2> point12 = {87.6088, 51.319};
    array<double, 2> point13 = {78.5609, 80.4623};
    array<double, 2> point14 = {51.3967, 39.5269};
    array<double, 2> point15 = {32.2042, 81.8779};
    array<double, 2> point16 = {79.1739, 81.5467};
    array<double, 2> point17 = {95.2619, 40.4742};
    array<double, 2> point18 = {86.437, 92.4406};
    array<double, 2> point19 = {3.95388, 60.2327};
    array<double, 2> point20 = {31.1283, 44.4917};
    array<double, 2> point21 = {35.6778, 79.8545};
    array<double, 2> point22 = {50.9926, 66.1373};
    array<double, 2> point23 = {3.16271, 65.2519};
    array<double, 2> point24 = {56.3665, 45.3819};

    vector<array<double, 2>> poses = {
        point0,
        point1,
        point2,
        point3,
        point4,
        point5,
        point6,
        point7,
        point8,
        point9,
        point10,
        point11,
        point12,
        point13,
        point14,
        point15,
        point16,
        point17,
        point18,
        point19,
        point20,
        point21,
        point22,
        point23,
        point24,
        };

    array<double, 2> search_point = {43.6406, 57.5691};

    auto start = std::chrono::high_resolution_clock::now();
    vector<std::pair<double,int>> results;
    double distance = 10000.0;
    int winner_id;
    for (size_t i = 0; i < poses.size(); i++)
    {
        array<double, 2> pose = poses[i];
        double tranlation_distance = euclidean_distance(search_point[0], search_point[1], pose[0], pose[1]);

        if (tranlation_distance < distance)
        {
            distance = tranlation_distance;
            winner_id = i;
        }
    }

    std::array<double, 2> inspection_space_min = {0.0, 0.0};
    std::array<double, 2> inspection_space_max = {100.0, 100.0};
    OrthoTree::BoundingBox2D inspection_space;
    inspection_space.Min = inspection_space_min;
    inspection_space.Max = inspection_space_max;
    //Standard Tree
    auto tree = QuadtreePointC(poses, 9, inspection_space);

    auto neighbors = tree.GetNearestNeighbors(search_point, 1);

    if (neighbors[0] != winner_id)
    {
        std::cout << "Winner ID is: " << winner_id << " with distance: " << euclidean_distance(search_point[0], search_point[1], poses[winner_id][0], poses[winner_id][1]) << std::endl;
        std::cout << "Tree found ID: " << neighbors[0] << " with distance: " << euclidean_distance(search_point[0], search_point[1], poses[neighbors[0]][0], poses[neighbors[0]][1]) << std::endl;

        std::cout << "Search Point with coordinates:  X: " << search_point[0]<< " Y: " << search_point[1] << std::endl;
        for (size_t i = 0; i < poses.size(); i++)
        {
            std::cout<<"ID is: "<< i << " Point with coordinates:  X: " << poses[i][0]<< ", " << poses[i][1] << "  With Distance "<< euclidean_distance(search_point[0], search_point[1], poses[i][0], poses[i][1]) << std::endl;

        }
    }
}

int main()
{
   test_setup();
}

Here the tree returns ID 22, which has a distance of 11.2901 to the search point, although the winner should be ID 3 with a distance of 10.2782. I did a lot of testing to maybe find the bug myself, here are some interesting things I found:

I think this is somehow triggered too early, at a stage where the smallest distance is not yet inside setEntity. Here a bigger K would also make the code go in there at a later stage, which is probably why the bigger K sometimes fixes things.

I am not sure what the if function is for and why the distances we are comparing are distances to the wall. I am a bit out of my depth here and need help debugging this.

Im also not sure why this has not happened to anyone else or why it was not found by the unit test. Maybe because the unit tests dont have enough points in the tree, but thats just a guess because as I said I needed more than 20 for Quadtree and probably more for higher dimensional trees.

attcs commented 9 months ago

Hi! Thank you for the detailed description, it was very helpful! I pushed a bugfix, I hope you won't have any other problems with my library.

LeweC commented 9 months ago

Great! I have also tested some different configurations and larger sample sizes, it seems to be fixed. I will close the issue now.

Thanks for the quick work!

LeweC commented 9 months ago

Sorry I spoke too soon.

It seems that the bug persists in higher dimensions. As I wrote above, my quick tests with the quadtree went fine.

Here is another example where I get this error and a large K seems to fix it. This time for 6 dimensions:

void test_setup()
{
    array<double, 6> point1 = {50.2232, 0.276687, 37.7662, 41.2776, 26.3818, 74.0284};
    array<double, 6> point2 = {35.8946, 83.7503, 97.1127, 47.2895, 40.9232, 83.7666};
    array<double, 6> point3 = {33.3541, 63.0669, 3.86075, 47.7923, 19.8039, 87.5608};
    array<double, 6> point4 = {89.0684, 67.3278, 50.867, 49.8193, 72.6692, 54.0271};
    array<double, 6> point5 = {75.7099, 53.1241, 39.624, 40.4669, 13.6433, 88.6247};
    array<double, 6> point6 = {43.7641, 93.7985, 68.8286, 9.71882, 2.16644, 12.1925};
    array<double, 6> point7 = {82.5195, 30.6964, 71.0556, 48.4744, 99.2295, 70.4137};
    array<double, 6> point8 = {28.4568, 37.8366, 74.8597, 27.897, 60.6816, 0.247821};
    array<double, 6> point9 = {58.3617, 17.0165, 15.1021, 70.9832, 22.5325, 34.8085};
    array<double, 6> point10 = {33.1004, 72.6729, 35.7043, 35.2888, 94.9917, 17.652};
    array<double, 6> point11 = {80.9693, 7.41406, 2.18394, 40.2606, 1.63274, 65.8996};
    array<double, 6> point12 = {88.4851, 73.8366, 55.0264, 77.6467, 61.6751, 33.7444};
    array<double, 6> point13 = {48.1777, 90.1285, 75.9069, 49.1867, 39.7186, 74.2862};
    array<double, 6> point14 = {68.0826, 73.0622, 7.85933, 60.8879, 41.3229, 16.6928};
    array<double, 6> point15 = {19.1693, 49.1172, 91.055, 52.5548, 16.2326, 0.610075};
    array<double, 6> point16 = {19.3085, 81.9363, 55.8924, 1.77161, 29.4124, 72.3873};
    array<double, 6> point17 = {57.8801, 28.3918, 45.7062, 92.1774, 0.742745, 29.0499};
    array<double, 6> point18 = {74.2952, 84.6691, 46.3357, 76.7894, 68.8394, 24.1707};
    array<double, 6> point19 = {0.199418, 93.2845, 94.9628, 38.2295, 82.1998, 59.2499};
    array<double, 6> point20 = {68.4689, 0.261607, 75.5001, 58.427, 55.4243, 18.7392};
    array<double, 6> point21 = {53.9164, 95.4966, 59.657, 71.0292, 82.4362, 53.9452};

    vector<array<double, 6>> poses = {
        point1,
        point2,
        point3,
        point4,
        point5,
        point6,
        point7,
        point8,
        point9,
        point10,
        point11,
        point12,
        point13,
        point14,
        point15,
        point16,
        point17,
        point18,
        point19,
        point20,
        point21,
        };

    array<double, 6> search_point = {78.8658, 64.0361, 18.7755, 61.4618, 14.3312, 40.0196};

    std::array<double, 6> inspection_space_min = {0.0, 0.0, 0.0, 0.0, 0.0, 0.0};
    std::array<double, 6> inspection_space_max = {100.0, 100.0, 100.0, 100.0, 100.0, 100.0};
    OrthoTree::BoundingBoxND<6> inspection_space;
    inspection_space.Min = inspection_space_min;
    inspection_space.Max = inspection_space_max;
    //Standard Tree
    auto tree = TreePointND<6>();
    tree.Create(tree, poses, 10, inspection_space);

    vector<array<double, 6>> m_data = poses;

    auto neighbors = tree.GetNearestNeighbors(search_point, 1, m_data);

    using AD = OrthoTree::AdaptorGeneral<2, array<double, 6>, OrthoTree::BoundingBox2D>;
    auto const itMin = std::ranges::min_element(poses, [&search_point](auto const& lhs, auto const& rhs) { return AD::distance2(lhs, search_point) < AD::distance2(rhs, search_point); });
    auto real_tree_distance = std::distance(poses.begin(), itMin);

    if (neighbors[0] != real_tree_distance)
    {
        std::cout << "real distance ID is: " << real_tree_distance<< std::endl;
        std::cout << "Tree found distance ID: " << neighbors[0]<< std::endl;
    }
}

I also used the same method to check the results as your unit tests, instead of my own euclidean distance method.

Btw. in your most recent commit for the unit test https://github.com/attcs/Octree/blob/6dba0d31ac07997867a8aa354695679d83f0e5f0/unittests/general.tests.cpp#L1747 Shouldn't distance2 be distance? Because that is the distance metric used by the PointTree. For example: https://github.com/attcs/Octree/blob/6dba0d31ac07997867a8aa354695679d83f0e5f0/octree.h#L1781

Seems to me that its the same bug, so maybe its a problem with the same if codindition and break. But I am not sure why it seems to be fixed for quadtrees.

attcs commented 9 months ago

Hi! It was a different problem. As I checked the example I found out, if the given depth of the tree was set to lower (6 instead of 10), it worked correctly. (The recommended value of depth is 3-5 in a usual case., otherwise the deep hierarchy could slow down the searches.)

I found the bug in the MortonEncode function: uint32_t is the grid id variable, which could lead to integer-overflow during the shifting (with this dimension number and high depth), consequently resulted erroneous Morton-ids.

distance2 is almost the same metric, just the sqrt is not applied, this does not change on the order.

Thank you very much for your help, I really appreciate it.

LeweC commented 9 months ago

Works great! I am happy to help and thank you for your quick fixes. I will close the issue again :)