The quest for better import times

cuddlyogre commented 1 year ago

I've been exploring multiple types of stud instancing. Linked duplicate objects, collection instances, and geometry nodes.

Initial testing seems to indicate instancing doesn't help performance at all. In a lot of cases it seems to hurt performance.

In the case of collection instances and linked duplicates, import times went up and RAM usage either also increased or stayed the same.

Blender starts to run into performance problems with large amount of objects. Using linked duplicates on a large set like http://omr.ldraw.org/files/1052 is very laggy because there are so many studs. Replacing the studs with empties and instancing the studs using collections had very similar results.

The case of the geometry nodes strategy, I removed the edge split modifier and implemented a color replacement node so that a part is only loaded once, as opposed to once per material. Import times slightly decreased and RAM usage went down or stayed the same. http://omr.ldraw.org/files/1052 was more pronounced, while http://omr.ldraw.org/files/338 was pretty close to the same. It appears the more studs and parts and studs that share the same color, the more performance is gained.

Although the improvement isn't very notable at this scale, it may become relevant at the scales of projects like Datsville. I will have to test.

There is also the issue of how to handle instanced studs that have a texmap applied to them. I think the best approach is to not instance those and just treat them like normal. At the moment, the textures don't load at all in the instancing branch. But, I don't know if it's worth pursuing this much further considering the minimal effect on performance versus the complexity of the code.

It is also possible that this logic will work better in a different engine, so it may be worth fleshing out.

cuddlyogre commented 1 year ago

Adding a Realize Instances node improves import times by a small amount but has a mixed effect on RAM usage. The number of objects in the scene drops significantly which seems to improve viewport performance.

cuddlyogre commented 1 year ago

Pickling already processed items increases load times slightly. This happens because vectors have to be processed more than once because they can't be pickled, as well as the unpickling process.

I'll see how using json works out.

Being able to read and write geometry_data objects to file may prove helpful with offloading processing to an external program in a faster language. At the moment, the only faster language I really know is c#. There doesn't appear to be an easy way to interface python and c#. c++ may be a more suitable candidate, but I have to learn it.

Only processing each part file once (accounting for whether or not they are texmapped) decreases load times slightly.

In the past, I flattened the part and wrote it to file. Between loading the file and being unable to cache the subfile lines, load times increased significantly.

ScanMountGoat commented 1 year ago

If you want to test out different instancing strategies, I would look at the world map. It should load almost instantly when instanced well despite having thousands of pieces. http://omr.ldraw.org/files/1707

Being able to read and write geometry_data objects to file may prove helpful with offloading processing to an external program in a faster language.

My addon utilizes compiled Rust code with multithreading and other optimizations. The bottleneck is still the Blender Python code for almost all files. The biggest speedups in my experience working with Blender come from better usage of Python and Blender's API.

You can see the Rust processing time and total load times printed to the console for https://github.com/ScanMountGoat/ldr_tools_blender. The projects are all open source if you want to use them. I would identify major bottlenecks first before trying to use a native Python module with Rust, C++, etc.

In the case of collection instances and linked duplicates, import times went up and RAM usage either also increased or stayed the same.

You'll need to apply modifiers ahead of time for linked duplicates to actually reduce memory usage.

The number of objects in the scene drops significantly which seems to improve viewport performance.

Assuming equal total vertex count, a few large objects are generally easier to render than lots of small objects. Blender doesn't seem to use any kind of instanced rendering. Viewport performance will probably be pretty bad if it has to draw each stud individually.

ScanMountGoat commented 1 year ago

You can actually get close to the speed of compiled code in Blender just using Python and numpy. Setting properties using foreach_set and a flattened numpy array just needs to do a copy. You can create meshes just using bpy.types.Mesh and using foreach_set to quickly initialize all the data. It's even faster than bmesh. This avoids any looping that would otherwise slow things down.

You can ctrl+f for "foreach_set" to see how to initialize mesh data. It's a little hard to find information on it online. https://github.com/ScanMountGoat/ldr_tools_blender/blob/main/ldr_tools_blender/importldr.py

The main challenge with LDraw files is actually getting the data into a numpy array in the first place. The ImportLDraw addon spends a lot of time on join operations for subparts and primitives. It also caches a lot of intermediate results that end up being recalculated later. I found that it performed about the same just switching to numpy arrays. Assuming you could come up with a better caching and joining strategy, you could probably get a decent speedup.

The renderer I wrote uses the same loading library as the Blender addon, so the techniques should still be applicable. I don't know how much performance you lose using this caching and instancing strategy in pure Python compared to Rust. https://github.com/ScanMountGoat/ldr_wgpu/blob/main/ARCHITECTURE.md#caching-and-instancing

cuddlyogre commented 1 year ago

Food for thought. Thank you!

cuddlyogre / ExportLDraw

The quest for better import times #40