AcademySoftwareFoundation / openvdb

OpenVDB - Sparse volume data structure and tools
http://www.openvdb.org/
Mozilla Public License 2.0
2.65k stars 654 forks source link

[BUG] ThreadSanitizer issue in meshToVolume conversion #1601

Open jdumas opened 1 year ago

jdumas commented 1 year ago

Environment

Operating System: Ubuntu 22.04 Version / Commit SHA: 10.0.1 Other: GCC 11 and LLVM 14

Describe the bug

I've run into a ThreadSanitizer issue in the meshToVolume() function. I have reproduced the issue in a minimal repository using GitHub Actions. Please see this log for details about the error.

To Reproduce

Simple head over to this repository and compile the code with ThreadSanitizer enabled on a Linux machine.

Expected behavior

No TSan issue :)

jdumas commented 1 year ago

@Idclip do you have an idea what could be the TSan issue with the meshToVolume() function?

Idclip commented 1 year ago

Hey @jdumas yeah I looked into this and can see at least one issue which would be causing this error, from the probeValue call in this line:

https://github.com/AcademySoftwareFoundation/openvdb/blob/9a57c7e68e5de9ecfd05a98364dcfb8063f71186/openvdb/openvdb/tools/MeshToVolume.h#L1472

This operator is responsible for negating the single band of voxel values which lie directly on the ISO crossing. At this point all other exterior values will be negative, but this single band of voxels will all be positive, so we need to flip the sign of the ones on the inside (if any). This op loops over this band and, if they have a negative neighbour, checks the dot prod of directions to the surface between them to determine if it needs to have its sign flipped.

The op first runs across neighbouring voxels in the same leaf node as the currently processing voxel before looking into voxels which reside in a different node - this latter case may result in the op reading from neighbour values which may be being written to from other threads at the same time.

However... I think that this will still result in deterministic and correct results because at least one negative voxel which isn't in the band will be accessible. The reason TSAN is complaining is because voxel (A) may have a neighbouring voxel (B) in a different leaf, which is also on the ISO crossing and whose having its sign flipped. We could in theory read garbage from trying to access (B)'s value - but because (B) is guaranteed to share a different, non ISO crossing negative neighbour with (A) (due to the fact that this is a single band of voxels) then this shouldn't cause any ill effects and will only potentially result in some unnecessary computation.

I think... In any case we should try and fix this, though I'm not sure how without introducing a separate pass. If my theory is correct then what exists today is the most performant implementation and the non-deterministic evaluation of nval<-0.75 for voxels in the single band has no side effects.

jdumas commented 1 year ago

Is it possible that a narrow-band voxel be surrounded with all its 26 neighbors by other narrow-band voxels though? E.g. if there are a lots of triangles crossing a single voxel. Would that affect the outcome of the algorithm? I see that the traceExteriorBoundaries() already does this sort of two-pass loop to propagate signs across leaf node boundaries.

In any case I think I'd rather have a slightly slower algorithm that makes TSan happy than the other way around (or at least have it be a compile option so I can sanitize the rest of my codebase...).