Closed soypat closed 2 years ago
So the basic idea is... render to a slice of triangles (in memory) rather than write it via a channel to the file? That's a speedup for octree and regular marching cubes right? (at the cost of allocated memory)
A few observations...
building a bolt in the benchmark - the screw form doesn't render well with octree marching cubes. because the signed distance field is actually wrong at far distances. Probably doesn't matter for the purposes of the benchmark, but still...
Octree marching cubes is single threaded. getting some go routines in there could give some xN speedups. (N == number of cores)
What I really want to see work properly is the double contouring renderer. At the moment it's slow and the result is shitty. I'd be ok with it taking a little bit longer than marching cubes if the stl files were smaller and the quality of the mesh just as good.
I believe both would benefit from the same rewrite since the use of channels is inefficient in this case. I do have an idea for speeding it using multi-core functionality up but it requires considerable concurrency infrastructure. Channels should ideally receive a large batch of triangles to write to file as a []Triangle3
since the channel's send overhead becomes a bottleneck if sending individual Trangles.
I think the best approach is to first have a working non-concurrent simple implementation of the renderer without fancy channels. Once you have a working single threaded renderer one can start thinking of reusing the single-threaded functions one has at hand to be multithreaded and developing a MultiCoreRenderer3 interface or something of the sort.
If I am given your blessing I will tear down the concurrent implementation as it stands today and rewrite all of it to be single-threaded with the possibility of a even faster multi core version tomorrow.
As for the double contouring renderer I can give that a go as well- Be warned I'm not that well versed in computational geometry, if that's were the problem lies.
I am also in need of solving https://github.com/deadsy/sdfx/issues/35, which was the primary reason I forked this repository... I may end up shaving more yaks than I planned.
if I am given your blessing I will tear down the concurrent implementation as it stands today and rewrite all of it to be single-threaded with the possibility of a even faster multi core version tomorrow.
The normal marching cubes (march3.go) is concurrent. The octree marching cubes (march3x.go) is single-threaded.
btw - the reason the slower march3.go is kept around is because of the afore-mentioned issues with rendering objects with approximated distance fields (screw threads mostly).
Other than that the octree renderer is equivalient to uniform cube decomposition of space- it just does less work.
If performance is a significant concern then there's a big gain to be had in throwing more cores at the octree renderer.ie xN where N is how many cores you have. In this case a channel implementation for streaming the triangle out is advantageous because it takes care of concurrency issues for you.
// External code writes triangles to this channel.
// This goroutine reads the channel and writes triangles to the file.
c := make(chan *Triangle3)
It might be an idea to try some channel buffering experiments to see if there are any performance gains to be had from that. I suspect most of the gains you've seen with an in-memory slice could be gained by a bit more decoupling between the renderer and the file writer.
io.Writer
That's a byte buffer oriented interface. Marching cubes creates a stream of triangles, and how you choose to marshal those into bytes is somewhat arbitrary, e.g. stl, 3mf, ....
Now nothing stops you from building a converter - triangle input channel, write to io.Writer with a marshalling format of your own design- but that doesn't belong in the renderer.
E.g.
object + renderer -> stl writer (to file) object + renderer -> 3mf writer (to file) object + renderer -> converter (to io.Writer)
The normal marching cubes (march3.go) is concurrent. The octree marching cubes (march3x.go) is single-threaded.
I think you are mixing them up? MarchingCubesUniform
is very much single threaded from what I'm seeing here:
func (m *MarchingCubesUniform) Render(s sdf.SDF3, meshCells int, output chan<- *Triangle3) {
// work out the region we will sample
bb0 := s.BoundingBox()
bb0Size := bb0.Size()
meshInc := bb0Size.MaxComponent() / float64(meshCells)
bb1Size := bb0Size.DivScalar(meshInc)
bb1Size = bb1Size.Ceil().AddScalar(1)
bb1Size = bb1Size.MulScalar(meshInc)
bb := sdf.NewBox3(bb0.Center(), bb1Size)
for _, tri := range marchingCubes(s, bb, meshInc) { // this is a slice, not a channel
output <- tri
}
}
While MarchingCubesOctree
makes use of a dcache3
type which does send triangles over a channel. It is my understanding
MarchingCubesUniform
(what you call normal marching cubes, no?) is single threadedMarchingCubesOctree
does some multi-threaded workI wrote a single-threaded implementation for MarchingCubesOctree
which was around twice as fast as the original version, which from my understanding as I have outlined here, is multi-core? Maybe I'm not following you completely. Anyways, performance is a minor concern of mine, if it can be gained, so be it. My major concern is that the current implementation jumped the gun and tried to be concurrent before having a good single-threaded implementation. This makes for a unwieldy API for working with the STL render functions.
I want to write a real-time browser 3d renderer (current sdf-ui implementation is slow and clunky) using three.js and Go WASM bindings to make it a pleasure to work with sdfx (this is for the CERN-organized hackathon/competition). As the render
package exists today I much prefer to fork the repository and rewrite it from scratch to best fit my use case. This is because the current implementation is lacking in thoughtfulness to what users would be using. There is also a concern that the API as it exists today is slow because of this premature concurrent optimization.
As a user, contributor and just random guy on the internet, I strongly suggest the render
package be thought from the ground up. Not only would in make it nicer to work with, it would also make the package as a whole much easier to contribute to! Having a single threaded implementation is simple and easy to follow. New users could improve these functions and they'd also be improving the multi-core version since ideally the multi-core renderer would also use these functions!
I think you are mixing them up
No. Read the code. The dcache3 stuff had locks put on it in preparation for a multi-threaded octree renderer, but it's not currently necessary.
real-time browser 3d renderer
That's a different problem than the renderer deals with. ie - 3d preview concerns itself with visible faces while STL generation has to concern itself with the whole object.
No. Read the code. The dcache3 stuff had locks put on it in preparation for a multi-threaded octree renderer, but it's not currently necessary.
I was wrong. I think I managed to find the parallelization in normal marching cubes in an init()
function which starts up a workerpool on evalProcessCh
.
That's a different problem than the renderer deals with.
I'm not concerning myself with low level 3d preview- three.js
receives a 3D object, in this case that could be a bunch of triangles and it itself does the 3D face culling and whatnot. I just need the whole set of triangles and three.js will provide a fluid 3d preview of the whole part.
That's a byte buffer oriented interface. Marching cubes creates a stream of triangles, and how you choose to marshal those into bytes is somewhat arbitrary, e.g. stl, 3mf, ....
For my application triangles become bytes when I send them over http. I guess I could use the Render3
interface, but more on why I'm not a fan of the Render
function signature below.
It might be an idea to try some channel buffering experiments to see if there are any performance gains to be had from that. I suspect most of the gains you've seen with an in-memory slice could be gained by a bit more decoupling between the renderer and the file writer.
Yes, this is true. The single-triangle queue given by the Render3
is a huge bottleneck. A better signature for this could be
Render(sdf3 sdf.SDF3, meshCells int, output chan<- []Triangle3)
, where output
receives batches of triangles... though this is questionable design for several reasons
Render
- who closes the channel? Does this function block? Do I have to call it as a goroutine?I'm not sure what form a "good" Render
interface would have. I'd really have to think long about it. It would be awesome if there was no channel handling on the user's side, but rather that happened internally.
Octree rendering got almost a x2 speed boost. With a little creative freedom I'd offer rewriting the STL part of the
render
package as it seems to be lush with room for improvement, both speed and API thoughtfulness such as adding io.Writer convention, minimizing heap usage, and more.