akuukka / quickhull

C++ implementation of the 3D QuickHull algorithm
265 stars 51 forks source link

Invalid Hull Output #23

Open Kushal-Shah-03 opened 2 months ago

Kushal-Shah-03 commented 2 months ago

The quickhull algorithm sometimes produces a non-convex output, which has internal triangles. I've attached some images and tried to explain the issue in https://github.com/elalish/manifold/pull/781,

The gist of it is that I was testing the algorithm on the Thingi10K dataset and I discovered these two cases which had issues: 39202.stl:

Screenshot from 2024-06-27 19-16-47 Screenshot from 2024-06-27 20-56-05 Screenshot from 2024-06-27 21-13-10 Screenshot from 2024-06-28 13-56-12 Screenshot from 2024-06-28 13-56-38

From these images it's clear to see that there is an excess volume and area of the object, which is caused due to the piece contained inside the object, this extra piece is produced by this quickhull implementation, which leads to a non-convex output, and is clearly something that should not be present. Since my testing also involved comparing various algorithms, I noticed the VHACD implementation (https://github.com/kmammou/v-hacd/blob/master/include/VHACD.h#L2321) does not have this issue.

Similary for 1750623.stl :

Screenshot from 2024-06-28 13-58-16 Screenshot from 2024-06-28 13-58-25 Screenshot from 2024-06-28 13-58-46 Screenshot from 2024-06-28 14-00-23

The extra piece is contained inside the object, and it is also produced by this implementation but not the VHACD implementation.

In order to better identify what could cause the issue, I tried narrowing down the set of points, and have reduced the set of points to 18 points, this also has a similar volume inside the "hull" : Screenshot from 2024-07-04 10-48-36 image

This is the file containing the 18 points is called "18points.bin", I've stored the points in binary to avoid loss of precision, and the glb files of the outputs of akuukka implementation and VHACD implementation are also attached, for comparision.

To parse the points, you can use a function similar to


std::vector<glm::vec3> readBinaryData(const std::string& filename) {
    std::ifstream inFile(filename, std::ios::binary);
    if (!inFile) {
        std::cerr << "Error opening file for reading!" << std::endl;
        return {};
    }
    std::vector<glm::vec3> data;
    glm::vec3 vec;
    while (inFile.read(reinterpret_cast<char*>(&vec), sizeof(glm::vec3))) {
        data.push_back(vec);
    }
    inFile.close();
    return data;
}

Files.zip

I am looking to fix this issue, as we (@elalish, @pca006132 and me) want to continue using this implementation in our library Manifold

Kushal-Shah-03 commented 2 months ago

I was doing a bit of testing to identify the issue, while comparing the two implementations, I tried printing the points being added to the hull in each algorithm in order

Note : Base refers to the points added to the base tetrahedron, Hull1 is the current implementation, Hull2 is VHACD implementation

Hull1 Point Base -18.2869 31.6738 2.175 Hull1 Point Base -17.6332 -17.342 89.9628 Hull1 Point Base 49.8656 -0 55.5071 Hull1 Point Base -20.4426 -35.4077 8.275 Hull1 Point -23.0164 39.8656 79.0839 Hull1 Point 10.2294 -14.7178 10.508 Hull1 Point -24.9832 43.2722 52.7107 Hull1 Point -18.6273 -39.5444 55.5071 Hull1 Point 13.2675 -19.98 28.1179 Hull1 Point -24.9832 -40.2722 52.7107 Hull1 Point 11.1761 -22.3575 45.2756 Hull1 Point -24.9492 1.5 54.5981 Hull1 Point -25 21.8857 49.9071 Hull1 Point 18.4196 -18.2153 52.4501 Hull1 Point -25 -12.7727 49.9071 Hull1 Point -24.9832 -43.2722 52.7107 Hull1 Point 9.20583 -23.4785 55.334 Hull1 Point -1.62324 -29.7942 48.3949

Hull2 Point Base -17.6332 -17.342 89.9628 Hull2 Point Base 49.8656 -0 55.5071 Hull2 Point Base -23.0164 39.8656 79.0839 Hull2 Point Base -18.2869 31.6738 2.175 Hull2 Point -20.4426 -35.4077 8.275 Hull2 Point -24.9832 43.2722 52.7107 Hull2 Point -24.9832 -43.2722 52.7107 Hull2 Point 10.2294 -14.7178 10.508 Hull2 Point 13.2675 -19.98 28.1179 Hull2 Point -18.6273 -39.5444 55.5071 Hull2 Point -25 21.8857 49.9071 Hull2 Point -24.9492 1.5 54.5981 Hull2 Point 11.1761 -22.3575 45.2756 Hull2 Point -25 -12.7727 49.9071 Hull2 Point 18.4196 -18.2153 52.4501 Hull2 Point -1.62324 -29.7942 48.3949

The first 5 points forming the hull are same for the both algorithms, so we can safely check for the points after it. There are only two points that differ between the algorithms, these points may lead to the issue. One of them "-24.9832 -40.2722 52.7107", is clearly collinear (since -24.9832 43.2722 52.7107, -24.9832 -43.2722 52.7107 are a part of the convex hull) The other point is "9.20583 -23.4785 55.334" which appears to be almost coplanar to the convex hull produced by either algorithm. These images will give you an idea of where the two points are located

Screenshot from 2024-07-03 12-31-55

Screenshot from 2024-07-03 12-31-52

I noticed that these 5 points are also coplanar for our case (-24.9832, -43.2722, 52.7107) (-25, -12.7727, 49.9071) (-24.9832, -40.2722, 52.7107) (-25, 21.8857, 49.9071) (-24.9832, 43.2722, 52.7107)

Based on these observations, I thought it could be something related to points having equal distances so I tried changing https://github.com/akuukka/quickhull/blob/master/QuickHull.hpp#L211, I replaced this line with if (D >= f.m_mostDistantPointDist), since I thought it could be due to the relative order of those equal points, and it appeared to fix the issue for both cases, so could there be some sort of ordering or tie-breaker for such points that was missed, and could help in fixing this issue? Although it was pointed out to me that since we are dealing with floating point numbers such change of sign, has very little significance, and is probably not what the actual issue was. I still felt that this sharing this could help identify the underlying cause.

I now plan to try and identify the first iteration of the algorithm where the corresponding output is a non-convex hull and attempt to see what could have caused the extra triangles in the output. I would appreciate some ideas and input on this, so that we can try and fix this bug

akuukka commented 2 months ago

Interesting! I'm afraid I don't have time to investigate this issue thoroughly, but I would be happy to check & merge a pull request if you can fix the problem.

akuukka commented 2 months ago

If here https://github.com/akuukka/quickhull/blob/4ef66c68950cb4db11d3b75bfe4034d807485ad0/QuickHull.cpp#L163

instead of

if (d>0) {

we have

if (d>m_epsilon) {

then the problem seems to go away - at least in the 18 point case.

Kushal-Shah-03 commented 2 months ago

If here

https://github.com/akuukka/quickhull/blob/4ef66c68950cb4db11d3b75bfe4034d807485ad0/QuickHull.cpp#L163

instead of

if (d>0) {

we have

if (d>m_epsilon) {

then the problem seems to go away - at least in the 18 point case.

Yeah I've tried that as well, it does seem to fix the issue for this case, but for certain other cases, the outputs varied compared to the other algorithm. I will check their renders to see if the hull is valid, and try to investigate further, and share the results with you.

Also, could you explain the significance of the second condition in https://github.com/akuukka/quickhull/blob/4ef66c68950cb4db11d3b75bfe4034d807485ad0/QuickHull.hpp#L197

I have been trying a couple of other things as well and I think I might be close to the fix, give me a couple of days. I will get back to you about them.

I have also worked on a condition to check if the intermediate polyhedron after each iteration is convex, I will create a draft PR where I can share it, I have identified the iteration where the polyhedron becomes non-convex for the first time, and am trying to identify the issue. I will share some screenshots of the intermediate renders by tonight as well

akuukka commented 2 months ago

That second condition simply tests if the distance to the plane is more than m_epsilon. It looks a bit ugly because we want to avoid doing square roots.

Kushal-Shah-03 commented 2 months ago

Alright, thanks for the quick reply. I have to run somewhere right now, I will post some of my results tonight.

akuukka commented 2 months ago

I think it's good idea to use m_epsilon instead of zero as the limit on line 163 (QuickHull.cpp), because otherwise faces that are not visible may appear visible due to numerical inaccuracy. This can corrupt the horizon edge loop and thus eventually the convex hull too.

And actually, we should use squared epsilon and take the plane's normal length into account here too, so

if (d > m_epsilon)

is not the correct check, but

if (D > 0 && DD > m_epsilonSquared P.m_sqrNLength) {

where D is

const T D = mathutils::getSignedDistanceToPlane(activePoint, P);

Kushal-Shah-03 commented 1 month ago

Ok, so I ran some tests for several cases based on your suggestions and my understanding, but the results I got were not satisfactory, I have shared an example of the result below.

For the case you suggested : if (D>0 && DD > m_epsilonSquared P,m_sqrNLength) at both lines 163 and 197

                      mean     median      mode            max       min  Success_Count
VolHull           1.000000   1.000000  1.000000       1.000000  1.000000         9995.0
VolHull_hull2     0.999933   1.000000  1.000000       1.017576  0.790454         9993.0
AreaHull          1.000000   1.000000  1.000000       1.000000  1.000000         9995.0
AreaHull_hull2    0.998905   1.000000  1.000000       1.015486  0.636606         9993.0
Time              1.000000   1.000000  1.000000       1.000000  1.000000         9995.0
Time_hull2        4.730233   1.963518  0.052858     401.556505  0.052858         9993.0

These values are normalized to 1 keeping our implementation as the base, so we can interpret this results as the output of our algorithm is wrong by more than 30% for area and more than 20% for volume. If we assume the VHACD implementation to be correct. (Since the outputs of VHACD and CGAL matched with a very small error percentage). Additionally upon checking I could find cases where the output was still an invalid hull.

And I obtained similar issues with other cases I tried like just changing the line 163 d>m_epsilon or replacing the d>0 of line 163 and line 197 with d>m_epsilon, you can see some of those commented conditions in the draft PR I have created, just so you can get an idea of the conditions I tried.

Based on these it seemed to me that there had to be some other issue as well,so I was going through the code and I noticed at https://github.com/akuukka/quickhull/blob/4ef66c68950cb4db11d3b75bfe4034d807485ad0/MathUtils.hpp#L20-L23, it assumed that the normal was of length 1 for euclidean distance, this function has been used at quite a few places and I don't think the normalization is always accounted for, and since I believed when we make the epsilon comparision at any point the distance should ideally be Euclidean, I decided to investigate. I noticed that when we calculated normals using getTriangleNormal we didn't normalize the results, so I decided to try adding just .getNormalized() to the output of that function, since this normal was then directly used as the normal of the plane. With no other changes in the code I tried running the dataset and the results became instantly more promising. I've attached them below:

                       mean     median      mode            max       min  Success_Count
VolHull            1.000000   1.000000  1.000000       1.000000  1.000000         9995.0
VolHull_hull2      1.000000   1.000000  1.000000       1.000131  0.999430         9993.0
AreaHull           1.000000   1.000000  1.000000       1.000000  1.000000         9995.0
AreaHull_hull2     1.000000   1.000000  1.000000       1.000039  0.999279         9993.0
Time               1.000000   1.000000  1.000000       1.000000  1.000000         9995.0
Time_hull2         5.302331   2.291871  0.020692     386.963920  0.020692         9993.0

Clearly the outputs are now within 0.1% precision which is a significant improvement, and upon checking for the number of "invalid" convex hulls using my function isMeshConvex (attached in the draft PR), there were only two cases out of the 10,000 where it gave an "invalid" convex hull, I haven't yet looked at the outputs, but it is possible that they're considered "invalid" just due to precision errors. I feel so because, compared to any other case I tried, they had 100s of "invalid" values. So, I felt that this is a necessary change. And I wanted to verify with you once if you feel this change logically makes sense. Now, I plan to further investigate the two cases, and also other possible ideas that could have caused the issue.

If you do think this change is needed, what would you suggest is the next place to look at, so that we can negate those two "invalid" cases as well?