crvs / KDTree

Simple C++ KD-Tree implementation
BSD 3-Clause "New" or "Revised" License
181 stars 70 forks source link

KDTree built with MSVC (Microsoft Visual C++) does not achieve 100% accuracy #12

Open kn1cht opened 1 week ago

kn1cht commented 1 week ago

Environment

Issue

KDTree built with gcc is 100% accurate, but when built with MSVC (Visual Studio 2022), the accuracy somehow drops. I reproduced this phenomenon in error_test.cpp. (The original code aborts the test if 100% accuracy is not achieved, but I modified it to continue.)

 > . \bin\Release\error_test.exe
 Total number of iterations ran: 30
 Accuracy (tested with 50 datasets per iter): 92.9333%. Total Number of correct queries: 46 / 50
 Total query time: { bruteForce: 0.0052009, kdTree: 0.0023472 }
 Accuracy (tested with 500 datasets per iter): 96.8733%. Total Number of correct queries: 484 / 500
 Total query time: { bruteForce: 0.491479, kdTree: 0.0470063 }
 Accuracy (tested with 1000 datasets per iter): 97.17%. Total Number of correct queries: 971 / 1000
 Total query time: { bruteForce: 1.98447, kdTree: 0.120078 }

Minimal reproducible example

I modified toy_test.cpp to create a simple example Strangely, the error does not always occur on the same input. That is, a single query may give the correct result, but some of the results may be wrong after ~100 runs. (In the lucky case, the test program does not produce errors. If you want to reproduce it, please try running it several times.)

#include <iostream>
#include <vector>
#include "KDTree.hpp"

int main() {
    pointVec points;
    points.push_back(point_t{ 0.0, 0.0 });
    points.push_back(point_t{ 1.0, 0.0 });
    points.push_back(point_t{ 0.0, 1.0 });
    points.push_back(point_t{ 1.0, 1.0 });
    points.push_back(point_t{ 0.5, 0.5 });
    KDTree tree(points);
    point_t pt { 0.8, 0.2 };
    point_t ground_truth { 1.0, 0.0 };
    std::cout << "query point: " << pt[0] << ", " << pt[1] << std::endl;
    std::cout << "ground truth: " << ground_truth[0] << ", " << ground_truth[1] << std::endl;
    int correct = 0, incorrect = 0;
    for(int i = 0; i < 100; ++i)
        tree.nearest_point(pt) == ground_truth ? ++correct : ++incorrect;
    std::cout << "correct: " << correct << ", incorrect: " << incorrect << std::endl;
}
> .\bin\Release\toy_test.exe
correct: 79, incorrect: 21

I am quite confused because the KDTree implementation will unlikely include any random elements. I will try to figure out why by reading your code.

kn1cht commented 1 week ago

I think I have finally found the possible cause.

In toy_test, the program starts with (0.5, 0.5) and then considers whether (0.5, 0.5) or (1.0, 0.0) is closer to (0.8, 0.2). Of course the latter is closer, so the program tries to insert the data for (1.0, 0.0) into k_nearest_buffer.

Whether the insert is actually done or not is determined by insert_it == k_nearest_buffer.end() || insert_it->first ! = std::next(insert_it)->first. In my understanding, this condition prevents the same point from being added.

https://github.com/crvs/KDTree/blob/a1baa5bb907fc5221d62abc42a5f2c3fa71b9cc3/KDTree.cpp#L137-L140

I printed out the memory addresses of insert_it->first and std::next(insert_it)->first and found something interesting. Both of the following outputs are at the point where there is only (0.5, 0.5) in k_nearest_buffer and the program is trying to insert (1.0, 0.0).

std::next(insert_it)->first changes with each execution and in the unlucky case matches insert_it->first. As a result, although (1.0, 0.0) should be added, the program did not do so and output an incorrect answer.

Since k_nearest_buffer is 1 in length, next of the 0th element matches k_nearest_buffer.end() (lines 2 and 3 of the output above). Attempting to access iterator end() cause undefined behavior (from cppreference). Thus when you compare something to std::next(insert_it)->first without checking if it is an end iterator, there is no guarantee that you will get the correct result.

For your reference, here is a similar output by a program compiled with gcc. The result of k_nearest_buffer.end()->first is very different from the normal memory address. This may be the reason why 100% accuracy was obtained with gcc.

insert_it->first: 0x561817160720
std::next(insert_it)->first: 0x1
k_nearest_buffer.end()->first: 0x1
insert_it->first != std::next(insert_it)->first: true

In summary, if insert_it points to the last element of k_nearest_buffer, the contents of std::next(insert_it) should not be accessed. Adding a condition to check this particular condition will ensure that KDTree won't error if built with MSVC.

@@ -135,6 +135,7 @@ void KDTree::node_query_(
         std::upper_bound(k_nearest_buffer.begin(), k_nearest_buffer.end(),
                          node_distance, detail::compare_node_distance);
     if (insert_it == k_nearest_buffer.end() ||
+        std::next(insert_it) == k_nearest_buffer.end() ||
         insert_it->first != std::next(insert_it)->first) {
         k_nearest_buffer.insert(insert_it, node_distance);
     }