Multithreaded Render3 implementation, parallelizing any other Render3 implementation.

Yeicor commented 2 years ago

The new MTRenderer3 is capable of parallelizing the generation of a surface by splitting the space into partitions and running any other Render3 implementation in each of those at the same time.

It works with Marching Cubes Uniform (overlappingCells: 0) and Dual Contouring (overlappingCells: 1). I don't know why Marching Cubes Octree does not work (it sometimes creates the whole surface ignoring the bounding box).

It achieves up to x7.3 speedup using my 12 cores (6 physical cores, tested on cylinder_head with Dual Contouring and a total of 256 meshCells).

Demo of automatic space partition (calling MTRenderer3.AutoSplitsMinimum(12) generates these 12 partitions):

Screenshot_20211230_001620

The final output doesn't have holes:

Screenshot_20211230_001231

deadsy commented 2 years ago

The existing marching cubes (uniform cubes) is multi-threaded. Does the mutl-threaded render code provide any speed up? If so, why?

Also - are there any changes here that gives a better result for the dual contour rendering? I'd prefer to merge a single dc algorithm that worked properly and gave comparable results to the existing renderers. I don't mind if it's slower if the resulting mesh size is smaller.

Yeicor commented 2 years ago

MTRenderer3 is intended for easy (and efficient) parallelization of singlethreaded Render3 implementations, like my Dual-Contouring implementations. However, I tested it and, at least on my computer, using this MTRenderer3 is slightly faster than the MarchingCubesUniform in its multithreaded mode (I also added an option to disable it, although it keeps using multithreaded mode as the default). My test results with MarchingCubesUniform:

MarchingCubesUniform{goroutines=NumCPU}: 5.81s
MarchingCubesUniform{goroutines=1} + MTRenderer3{NumCPU}: 5.35s
MarchingCubesUniform{goroutines=NumCPU} + MTRenderer3{NumCPU}: 5.54s

My guess is that the internal parallelization of MarchingCubesUniform requires spending more synchronization time than MTRenderer3, as it processes batches of 100 evaluations in parallel, while MTRenderer3 runs the algorithm independently NumCPU times, only needing to synchronize for safe triangle generation.

As to your second point, I can remove the first DC implementation (which I did not change) if you want, as it is faster and provides better results to run a planar simplification algorithm (with any maximum angle) over the final mesh: Screenshot_20220105_123156 I tested this Go implementation of a simplification algorithm, but it fails, generating some overlapping and backwards faces for all algorithms.

P.S. While testing the simplification algorithms, I realized that the vertices linking partitions are very close together (<1e-6 distance) but not in the same positions, resulting in broken meshes. So don't accept this until I fix it by merging those vertices.

Yeicor commented 2 years ago

P.S. While testing the simplification algorithms, I realized that the vertices linking partitions are very close together (<1e-6 distance) but not in the same positions, resulting in broken meshes. So don't accept this until I fix it by merging those vertices.

I fixed this in my last commit, by including a parameter that merges vertices that are very close together. MTRenderer3 now generates manifold meshes for Marching Cubes, and generates no holes for Dual Contouring.

This post-processing step adds around 0.5 seconds to my previous tests, so it makes MarchingCubesUniform slightly slower but it is still useful for DualContouring and future singlethreaded Render3 implementations.

deadsy / sdfx

Multithreaded Render3 implementation, parallelizing any other Render3 implementation. #43