RenderKit / embree

Embree ray tracing kernels repository.
Apache License 2.0
2.32k stars 383 forks source link

Fix triangle splitting crash where FP error causes binning differences #489

Closed jmsaunde closed 3 weeks ago

jmsaunde commented 1 month ago

Ultimately the root problem here is that when we do the binning stage we use a different method to test the first stage and the second. In the first stage we use bin() to get an integer bin associated with a point in space. In the second we get the floating point division point on the axis we have chosen to bin on and do FP comparisons instead. Under most circumstances this is fine.

Occasionally these binning differences will result in unprofitable splits happening but still building a usable tree.

In the test case I've given, the unprofitable splits we erroneously choose fill up the split limit and we don't end up splitting the triangles the previous stage had intended to choose. As a result, we can't split those triangles. Then when we do the real bifurcation, everything is on the same side. This resulted in an assertion failure in production (1).

(1) We never noticed that the RelWithDebInfo build in MSVC still has assertions enabled. I can provide another change disabling these since I think it's unintentional.

freibold commented 3 weeks ago

Hi! Thanks for this PR and especially for the small test! The changes look good. However, I've tried to verify that the test fails in the current Embree devel branch but it's always passing. It also passes in our internal CI running on various machines with many different configs. Can you double check that the test is really a reproducer that triggers asserts during the build (i.e, without the other changes you made). What config (VS version, OS version, CMake options, build type, ....) are you using?

jmsaunde commented 3 weeks ago

Hi! Thanks for this PR and especially for the small test! The changes look good. However, I've tried to verify that the test fails in the current Embree devel branch but it's always passing. It also passes in our internal CI running on various machines with many different configs. Can you double check that the test is really a reproducer that triggers asserts during the build (i.e, without the other changes you made). What config (VS version, OS version, CMake options, build type, ....) are you using?

Thanks for reviewing!

It was much easier to reproduce the issue by adding the feature in the first commit: supporting setting the max leaf size. The test needs and uses this feature to reproduce the issue. I believe an analogous test should exist without adding this but finding it would be difficult.

I'm running on MSVC2022/Win11. I'm off work today, but can follow up tomorrow with the other build options that I used. There were actually several CMake configuration settings we use internally so it's a good point.

freibold commented 3 weeks ago

Thanks, I was missing that the test uses the max_triangles_per_leaf setting and could now test the PR properly. I've merged this into our internal devel branch and it will be available publicly very soon.