Open PZerua opened 2 years ago
The largest cost in your numbers is casting a total of 10.739.327 individual rays using Embree. This process takes 33 seconds of the total time. Is there a way to improve this?
The smaller issue is the number of lods and the bigger issue is the normal reconstruction.
Sloppy simplify didn't give edge lengths, so I don't think we can use. Can check again.
Did some thinking. One cheap thing we can do is start from the last lod rather than from the start. The code to do this is relatively small.
Do you want to make a pr for that?
Hi, sorry for the delay.
The largest cost in your numbers is casting a total of 10.739.327 individual rays using Embree. This process takes 33 seconds of the total time. Is there a way to improve this?
I agree that is the bigger problem, but I haven't researched enough to come out with a solution or alternative approach. I did test using rtcIntersect1M
once for all the rays instead of rtcIntersect1
for each single ray, both with RTC_INTERSECT_CONTEXT_FLAG_COHERENT
and RTC_INTERSECT_CONTEXT_FLAG_INCOHERENT
, but I noticed no difference. I have no prior experience with Embree, so might be worth trying again in case I did something wrong. Maybe @JFonS has some input on this and can propose some alternatives.
The smaller issue is the number of lods and the bigger issue is the normal reconstruction.
The thing is that the total amount of rays is directly related to the amount of LODs (and the amount of indices in each LOD), so if we agree >= 10 LODs are too much and aim for a maximum of 6 or 8, that would help for both issues.
Did some thinking. One cheap thing we can do is start from the last lod rather than from the start. The code to do this is relatively small.
Do you want to make a pr for that?
Yeah, I also thought on the same thing. This will speed up LOD generation when calling meshopt_simplify
(although not sure how much), but won't help with the total ray count. I can give it a try in a few days.
I can't promise anything but if you're around I can show you where the code for start from the last lod.
https://github.com/godotengine/godot/blob/master/scene/resources/importer_mesh.cpp#L453
The theory is instead of the last merged_indices_ptr
, you use the last while loop new_indices
.
Hi, sorry for the delay, I've been quite busy at work.
Still want to work on this and I think I have an idea of how to implement it, but not sure when I'll have time to do it.
Also, I spent some time trying to understand better the context of the "normal reconstruction", and saw the discussion you had here: https://github.com/zeux/meshoptimizer/issues/158. So my understanding is "normal reconstruction" is currently a workaround for that issue and we should just wait to be fixed from meshoptimizer's side, although maybe is worth checking faster approaches in the meantime.
Looks like we might get a fix for the wrong normals after simplification https://github.com/zeux/meshoptimizer/pull/524. Hopefully this will make possible to remove all the calls to Embree and make import much faster.
I'll note that it's unclear if the pending work in the linked meshoptimizer PR will allow Godot to change its simplification strategy - meshoptimizer version used in Godot right now has some patches to enable attribute awareness, but they were likely insufficient to get good normal quality in certain cases which is why the reprojection code exists. My goal is to improve on the patches currently used in Godot (they have some quality bugs that are critical to resolve before I can merge anything), but I don't know if the improvement is going to be sufficient to just rely on output of meshoptimizer directly in all cases.
One of the bottlenecks is tangent space normal generation which is being worked on here https://github.com/godotengine/godot/pull/83648
Should be improved by #93727 (still some work to do in the future wrt reordering LOD generation from large to small).
Godot version
4.0.alpha14
System information
Windows 11, Intel i7-10750H, Nvidia RTX 2060 Laptop (511.65), Vulkan
Issue description
Godot takes a lot of time to import a scene with big meshes. To better understand the problem I've been doing some tests with the new "Colorful Curtains" (without Base Scene) from the new Intel's Sponza scene. The scene has twelve 4K textures and several meshes that add up to a total of 1.059.862 vertices, and while Blender only takes ~7 seconds, Godot takes ~1 minute and 3 seconds to import. I've spent some time investigating the causes with a profiler and I've found that from that import:
So I'd say the main issue is with LOD generation. Two observations:
Some possible changes I can think of to make it faster:
meshopt_simplifySloppy
, which seems to be x6.6 more performant, and test if it has big perceptible differences and wether it's worth changing or not.I explained the issue a bit over Rocket.chat a few days ago, but I'd like some discussion on this before I attempt a fix (if I'm capable).
Steps to reproduce
Download "Colorful Curtains" or any scene from the new Intel Sponza. Move GLTF scene to project folder and see it takes a very long time to import.
Minimal reproduction project
No response