Performance issue - Githubissues

CatCode79 commented 1 year ago

We have already talked about it here: https://github.com/Davidster/wgpu-sandbox/pull/2 But I prefer to open an issue to better focus on the problem.

Shadows Implementation is suspect number 1, but even disabling on-game doesn't completely solve the slowness (on my notebook).

I'll try again with Nvidia Nsight, but just today the new version of Puffin was released and as soon as I find some time I try it, if everything works better we will have a cpu side profiling (at least we will know for sure if the engine is cpu-bound or gpu-bound

Davidster commented 1 year ago

👍 integrating puffin sounds like a cool idea

CatCode79 commented 1 year ago

puffed.zip

I managed to do a small profiling of three interesting functions, in the next few days maybe add some other suspicious function.

To analyze the *.puffin file, you must use puffin_viewer with the following commands: cargo install puffin_viewer puffin_viewer

As you can see some frames last up to 140ms.

The integration with Puffin went well. I opted for a local server/client communication in order to avoid having to integrate egui within the engine.

Davidster commented 1 year ago

nice! I took at look at the profile. puffin seems like a cool little tool. makes sense that the render is taking up the most time, it's the most complicated part of the application and it's very likely that there are some performance bugs in there that I'm unaware of

CatCode79 commented 1 year ago

You have to take into account that for now I record, through puffin, the time of only 3 functions, there is no automatic way to get a flamegraph without inserting a lot of macros in each function. It is still an interesting tool.

Now that I think about it I've noticed quite a bit of validation errors and warnings from WGPU, I have no idea if they existed before, but they are certainly something to take into account. I wonder if validation errors and their management can degrade performance, I had read something about it if I remember correctly. Unfortunately I don't know much about it but as soon as I have a moment of time I'll take a look at it!

Davidster commented 1 year ago

that's a good idea, I should look at those validation errors

Davidster commented 1 year ago

gunna do a bit of work on the optimization side in the coming weeks when I find the time. First order of business is the fact that I'm generating a separate buffer and draw call for each mesh that I import from the gltf scene. For example, loading the free_low_poly_forest asset from get_gltf_path() results in ~2500 draw calls happening each frame and my GPU utilization is very low, under 50%. I'm wondering if it's possible to reduce the number of draw calls in that type of situation. Or maybe I just need to optimize the asset in blender so it only exports 1 single mesh. Then once that work is done I'll see what the next biggest bottleneck is. Hopefully at the end of it we can get the engine running at 60fps on your computer :D

CatCode79 commented 1 year ago

I see, I think you have to use a vertex buffer object (VBO), it is a buffer that remains resident in the GPU memory and does not need updates at every frame. Basically, you can load all the static information (walls and models) into the GPU while the dynamic information (where the models moves) are updated at every frame.

Unfortunately I don't know exactly how to configure a VBO, but I guess you need to use layers, bindgroups and indexes to tell shaders where to find what.

This link may be helpful: https://github.com/gfx-rs/wgpu/wiki/Do%27s-and-Dont%27s

In the next few days I will publish the branch I'm working on for Puffin integration so you can experiment a bit if you want.

Davidster commented 1 year ago

So I am already making vertex buffers in the gltf_loader::build_geometry_buffers function. so once the gltf file is loaded into a bunch of GpuBuffers, those buffers no longer get updated each frame. I believe that's conceptually the same as a VBO. The data that can get updated each frame would be the corresponding instance buffer. So each mesh in the scene gets a dynamically-growing (see GpuBuffer::write) instance buffer allowing you to render more than 1 of them and move them around the scene.

Thanks for the do's and don'ts page I forgot it existed :) the "group resource bindings by the change frequency, start from the lowest" part seems interesting, I'm going to try that out and see if it helps.

If you'd be interested in helping out, you could maybe try running that free_low_poly_forest branch (just checkout and run commit 0594a7), then try opening up the gltf file in blender and see if you can merge all the separate objects/meshes into big single object, export it into a new gltf file and see if it runs more smoothly after that. I wrote out some debug info that logs the number of meshes that get created (around 2500 for that gltf file) in the build_scene function.

Davidster commented 1 year ago

oops its 1250 meshes, not 2500

Davidster commented 1 year ago

Another thing to check out is whether the vertex buffers are using the correct type of memory. In vulkan it should be gpu-only memory, not host visible

Davidster commented 1 year ago

did some more careful inspection yesterday. found out there are quite a few slow parts in the renderer with the free_low_poly_forest scene loaded:

skinning::get_all_bone_data is taking 5ms, which is way too slow considering there are 0 no skinned meshes in the scene lol
RendererState::update is doing some nested loops which blows up with 1250 nodes like in that scene, that's 1562500 iterations where almost all of them are wasted. it needs to be optimized. it's also taking around 6-7ms in this scene
shadows are taking up something like 7ms per frame which is too slow, it's something I want to investigate a bit later but I found some interesting material about doing shadows efficiently here in the "Atlases" section https://godotengine.org/article/godot-3-renderer-design-explained.

CatCode79 commented 1 year ago

skinning::get_all_bone_data is taking 5ms, which is way too slow considering there are 0 no skinned meshes in the scene lol

Weird. I remember that among the validation (or error) messages that wgpu (probably naga actually) gave were some regarding skin/bone indexes.

RendererState::update is doing some nested loops which blows up with 1250 nodes like in that scene, that's 1562500 iterations where almost all of them are wasted. it needs to be optimized. it's also taking around 6-7ms in this scene

Ops! ^_^ Well, in the end this is good news, probably once you figure out how to optimize that part, everything will go much smoother.

shadows are taking up something like 7ms per frame which is too slow, it's something I want to investigate a bit later but I found some interesting material about doing shadows efficiently here in the "Atlases" section

Afaik the shadows are pretty hard to do right and fast, so this part will probably get us busy.

From what I've read it seems to me that Godot3 implements a dynamic shadow mapping system. These are all things I barely know and could be wrong, but I got the idea that dynamic shadow mapping could be used for shadows that are only changed in-game occasionally and not continuously frame by frame. In our context, continuously recreating the altlas with shadows every frame might weigh a lot (but I could be wrong about how they are created and used).

I was thinking about Carmack's Reverse technique but, after a little research, I discovered that it is patented (DOOM 3) and that probably with scenes with many vertices it is slow. Maybe there is some kind of usable Shadow Volume technique but I'm starting to think that if Godot3 uses that method there is a good reason.

It should be taken into account that Godot 4 uses a very advanced dynamic global lighting system, the SDFGI: https://www.youtube.com/watch?v=DNJXkcQxXEg so advanced that I wouldn't know where to start O_O

I had come across an Atlas of (static) shadow mapping in a bevy jam game, here it is: https://github.com/DGriffin91/BevyJam2022 I think shadow mappings are these: https://github.com/DGriffin91/BevyJam2022/tree/main/assets/textures/level1/bake/sm But it seems to have understood that Godot uses another technique through cubemapping while these are pre-generated shadows on textures.

CatCode79 commented 1 year ago

Ok, reading a bit about the shadow atlas in Godot 3 I discovered that allows rendering multiple lights in a single pass.

Instead, Volumetric Shadows techniques must calculate every single light in the scene. This is only feasible if there are few vertices and/or lights. They also allow you to have sharp and precise shadows, but to make soft shadows you have to do other calculations.

Godot3's shadow atlas have accuracy and size issues, which they corrected in Godot4: https://twitter.com/reduzio/status/1247702765100511239 independent bias, normal offsets and pancaking... I have no idea what these things mean :D

Davidster commented 1 year ago

Weird. I remember that among the validation (or error) messages that wgpu (probably naga actually) gave were some regarding skin/bone indexes.

ah right, I still need to take a look at those validation messages. I keep forgetting. Either way, I fixed the problem with get_all_bone_data, it was another double-nested looping situation. I really need to be more careful to avoid those :D.

next I'll do RendererState::update.

From what I've read it seems to me that Godot3 implements a dynamic shadow mapping system.

Yes, it's made to generate a new shadow map once per frame to account for moving objects. I would like for the shadow implementation to support moving objects but also be efficient. It should be doable! It's been standard in the gaming industry for a long time now if I'm not mistaken.

I agree godot3 is pretty old and has limitations so maybe looking at godot's latest shadow implementation is a better idea. Would be cool to look at bevy's too.

I've never heard of Carmack's Reverse before, I'm gunna read about it, sounds cool :).

It should be taken into account that Godot 4 uses a very advanced dynamic global lighting system, the SDFGI.

Oh yeah, I've seen that video before, it's pretty amazing. Global illumation could be something to look at in the future, I don't know anything about that topic but am very interested in learning about it, as well as dynamic ambient occlusion and reflections. For GI I think a simpler method is voxel cones, which is was they were doing in godot3 so maybe that's an easier starting point for learning before SDFGI, not sure tho.

Confluence of Futility seems to be on Godot version 4 or higher, so the engine comes with built-in dynamic shadows. I think the folder assets/textures/level1/bake/sm is just the smaller versions of the images in the parent folder assets/textures/level1/bake, so 'sm'='small'

CatCode79 commented 1 year ago

Could you enable the repository Discussions tab? I would like to continue the discussion about graphics there so as to keep this PR for performance issues.

Davidster commented 1 year ago

Should be done

Davidster commented 1 year ago

@CatCode79 could you checkout out the optimization branch and try running it on your machine? curious to see what kind of performance you get with some of the optimizations I've added so far. I'm hoping you'll be able to get 60fps with shadows and bloom turned off (press 'm' and 'b'). Feel free to check the commit msg for details on what I've been workin on this week.

Oh I should add some profile macros to that branch too so you can show me what you get in puffin!

CatCode79 commented 1 year ago

2022-11-28_optimization.zip

This is the Puffin Log I recorded, I had to disable Bloom and Shadows programmatically because it's so slow that I'm not even sure if I pressed the keys correctly during the game.

In general I can not tell you if it is better since the scene is not the same as the master branch.

But I updated the lunar vulkan sdk and played with the vulkan configurator: among the validation settings there is the possibility to choose a "Reduced-overhead preset" which helped a little more or less to double the performance. I tried disabling the validation layers completely but didn't get any kind of gain (and the validations were still there), so I have yet to figure out how to disable them properly if possible. Just for the record, the attached puffin was recorded with the validation layers active before I reduced the overhead, it still remains a significant log.

Another particular thing is that looking at a boulder I noticed that the material is a bit transparent, allowing me to see the stars in the skybox. I don't know if it's a problem of the model at the origin or that, but given how much transparency can weigh, it must be taken into account.

CatCode79 commented 1 year ago

More or less with this scene, i.e. the forest, I have an average of 9 FPS. With the master branch scene, I averaged 14 FPS.

Maybe it's useful for you to know that puffin also has the macro to profile just the scopes, and not just the whole function: https://docs.rs/puffin/0.14.0/puffin/macro.profile_scope.html

Davidster commented 1 year ago

Damn I was worried about it going too slow to press the keys, thanks for taking the time to check that.

You're on windows right? Could you try running in with dx12? It's weird that the validation layers are enabled. For me on Linux they stay disabled in a release build unless I set the environment variable (forgot the name of it). And on windows it automatically runs dx12 for me so I'm pretty confused.

Yeah I noticed that the sky shows up, I think it's due to the material on the object, I think it's causing the skybox to be reflected on it. I might have made a mistaking with the default fallback materials that get picked when there's no material in the gltf file.

Davidster commented 1 year ago

that's so weird, on my computer the physicsstate::step is taking longer than on yours, what explains that??

Davidster commented 1 year ago

it's almost as if somehow the project is getting compiled in debug mode except the dependencies are in release mode or something? or maybe puffin is misbehvaing? very confused!

Davidster commented 1 year ago

I will test this on my older pc tomorrow, I think its cpu is slower than yours, maybe I made a mistake

CatCode79 commented 1 year ago

You're on windows right? Could you try running in with dx12? It's weird that the validation layers are enabled. For me on Linux they stay disabled in a release build unless I set the environment variable (forgot the name of it). And on windows it automatically runs dx12 for me so I'm pretty confused.

Yes I'm on windows 10. I can't reproduce the problem so probably I'm just confused and the layers have always been there, just the configuration with less overhead makes the frames go better.

that's so weird, on my computer the physicsstate::step is taking longer than on yours, what explains that??

Yes, I tried again with some tests in both debug and release mode and on my laptop the physic step is in the order of hundreds of microseconds (and not milliseconds like on your pc).

My CPU is: But I have a cooling system that doesn't satisfy me and I always have the doubt that it enters in Thermal Throttling mode.

it's almost as if somehow the project is getting compiled in debug mode except the dependencies are in release mode or something? or maybe puffin is misbehvaing? very confused!

I think you need to set in cargo.toml these lines: [profile.release.package.""] opt-level = 3 (we already have the [profile.dev.package.""] lines)

But you actually mean that there is a mix of crates compiled in debug and others in release? Weird, I'd do a cargo clean, cargo update, and cargo build --release to be sure.

All this speech made me remember what I read here: https://doc.rust-lang.org/cargo/reference/profiles.html?highlight=nalgebra#overrides

For example, nalgebra is a library which defines vectors and matrices making heavy use of generic parameters. If your local code defines concrete nalgebra types like Vector4 and uses their methods, the corresponding nalgebra code will be instantiated and built within your crate. Thus, if you attempt to increase the optimization level of nalgebra using a profile override, it may not result in faster performance.

it could be useful if you use nalgebra for rapier.

Davidster commented 1 year ago

Yes cargo clean and rebuild might be a good idea, I'd also double check your system environment variables to see if you have set a vulkan env var and forgot about it.

https://youtu.be/_qRA-WS0aLA We can see on this video that 60fps should be easily possible with your CPU. and valorant is using many cores of the cpu whereas this project only uses 1 core, so thermal throttling shouldn't be an issue unless there's something wrong with your hardware.

I also know that dx12 performs better on windows than vulkan, so it would be important to check if it runs well on dx12 on your laptop. If it runs bad, another thing we can check is run MSI afterburner or maybe just check the task manager to see if the cpu/GPU are reaching high utilization % values and if they are reaching temperatures above ~75 degrees.

Davidster commented 1 year ago

just to double check, did you modify the number of physics_balls in the scene? (physics_ball_count). That could easily explain the physics step timing difference

CatCode79 commented 1 year ago

just to double check, did you modify the number of physics_balls in the scene? (physics_ball_count).

no, i just changed enable_shadows to false, enable_bloom to false. However from what little I could see (which at half frame per second is a challenge :D) there are no balls in the forest scene (optimization branch)

and valorant is using many cores of the cpu whereas this project only uses 1 core, so thermal throttling shouldn't be an issue unless there's something wrong with your hardware.

true, you're right, my fan makes so much noise sometimes that it makes me "uncomfortable"

Personally I suspect that there are hidden render draws that kill performance, however it must be taken into account that perhaps we have run into a particular case in which the self-managed barriers from wgpu have a bug and degrade performance. It shouldn't be seen that since wgpu version 0.13 they have improved performance and barrier handling a lot, but I know they still have more to do.

There's still that open job of merging the gltf models into one for the forest, isn't there? It could improve the performance of the rendering part considerably..

I'm also tempted to integrate tracy_full to have full view of message bouncing between cpu and gpu, I suspect all the time spent is barriers waiting for work from the gpu or vice versa.

Now I try to test with dx12, it should actually improve but, unless there are bugs in the vulkan drivers, by only a few percentage points.

CatCode79 commented 1 year ago

DX12 backend, interesting result:

Running target\debug\wgpu-sandbox.exe Serving demo profile data on 127.0.0.1:8585 Using NVIDIA GeForce MX150 (Dx12) [2022-11-28T18:06:06Z ERROR wgpu_hal::dx12::descriptor] Unable to allocate descriptors: RangeAllocationError { fragmented_free_length: 0 } [2022-11-28T18:06:06Z ERROR wgpu::backend::direct] Handling wgpu errors as fatal by default thread 'main' panicked at 'wgpu error: Validation Error

Caused by: In Device::create_bind_group note: label = InstancedMeshComponent textures_bind_group not enough memory left

', C:\Users\Gatto.cargo\registry\src\github.com-1ecc6299db9ec823\wgpu-0.14.0\src\backend\direct.rs:2403:5 stack backtrace: 0: std::panicking::begin_panic_handler at /rustc/b3bc6bf31265ac10946a0832092dbcedf9b26805/library\std\src\panicking.rs:575 1: core::panicking::panic_fmt at /rustc/b3bc6bf31265ac10946a0832092dbcedf9b26805/library\core\src\panicking.rs:65 2: core::ops::function::Fn::call<void ()(enum2$),tuple$<enum2$ > > at /rustc/b3bc6bf31265ac10946a0832092dbcedf9b26805\library\core\src\ops\function.rs:161 3: wgpu::backend::direct::impl$3::device_create_bind_group at C:\Users\Gatto.cargo\registry\src\github.com-1ecc6299db9ec823\wgpu-0.14.0\src\backend\direct.rs:1298 4: wgpu::Device::create_bind_group at C:\Users\Gatto.cargo\registry\src\github.com-1ecc6299db9ec823\wgpu-0.14.0\src\lib.rs:2170 5: wgpu_sandbox::renderer::BaseRendererState::make_pbr_textures_bind_group at .\src\renderer.rs:587 6: wgpu_sandbox::gltf_loader::build_scene at .\src\gltf_loader.rs:145 7: wgpu_sandbox::game::init_scene at .\src\game.rs:1235 8: wgpu_sandbox::start::async_fn$0::async_block$0 at .\src\main.rs:113 9: wgpu_sandbox::start::async_fn$0 at .\src\main.rs:120 10: pollster::block_on<enum2$<wgpu_sandbox::start::async_fn_env$0> > at C:\Users\Gatto.cargo\registry\src\github.com-1ecc6299db9ec823\pollster-0.2.5\src\lib.rs:125 11: wgpu_sandbox::main at .\src\main.rs:134 12: core::ops::function::FnOnce::call_once<void ()(),tuple$<> > at /rustc/b3bc6bf31265ac10946a0832092dbcedf9b26805\library\core\src\ops\function.rs:507

Davidster commented 1 year ago

@CatCode79 do you get the same error in a release build? I see you are running debug target\debug\wgpu-sandbox.exe

Davidster commented 1 year ago

this is definitely a problem though, I might have some ideas for how to mitigate

Davidster commented 1 year ago

I'm also tempted to integrate tracy_full to have full view of message bouncing between cpu and gpu, I suspect all the time spent is barriers waiting for work from the gpu or vice versa.

this is a great idea btw! would be interested in seeing

There's still that open job of merging the gltf models into one for the forest, isn't there? It could improve the performance of the rendering part considerably..

In a real game yes that's what should be done, but I actually think it's good to leave the gltf file as-is because it makes for a good benchmark

CatCode79 commented 1 year ago

Same error on release build.

The most useful WGPU issue I've found about it is this: https://github.com/gfx-rs/wgpu/issues/2857

Now that I remember, right now I'm compiling with the Nightly; thiserror crate gives me a strange error telling me that it is using a nightly-only feature and I solved avoiding the stable.. But is wgpu-sandbox meant to be compiled on nigthtly or in stable?

Tomorrow I try to integrate trace_full, so we remove a lot of doubts when we have the complete situation.

Davidster commented 1 year ago

wgpu supports stable. I think it'd be a good idea to try running it on stable

CatCode79 commented 1 year ago

ok, I fixed and compiled both in debug and in release with the stable, I always get the same error

Davidster commented 1 year ago

ok it looks like this scene is allocating 2gb of vram LOL. I'll have to look into this ASAP

Davidster commented 1 year ago

turns out vulkan uses 1.8gb and dx12 uses 2gb, so it's just on the edge of crashing for your gpu

Davidster commented 1 year ago

hey, so it turns out most of the memory usage is from the many large textures in the scene. I've disabled them in a test scene that you can get from this commit: f194d568c83662466 (optimization_disable_textures branch). Let me know if you get a chance to test that out, I wonder if the memory usage was the problem this whole time

CatCode79 commented 1 year ago

testing the optimization_disable_textures branch I have same error as before with dx12, with vulkan instead I have this:

thread 'main' panicked at 'attempt to subtract with overflow', src\game.rs:679:27 stack backtrace: 0: std::panicking::begin_panic_handler at /rustc/897e37553bba8b42751c67658967889d11ecd120/library\std\src\panicking.rs:584 1: core::panicking::panic_fmt at /rustc/897e37553bba8b42751c67658967889d11ecd120/library\core\src\panicking.rs:142 2: core::panicking::panic at /rustc/897e37553bba8b42751c67658967889d11ecd120/library\core\src\panicking.rs:48 3: wgpu_sandbox::game::init_game_state at .\src\game.rs:679 4: wgpu_sandbox::start::async_fn$0::async_block$0 at .\src\main.rs:116 5: core::future::from_generator::impl$1::poll<enum2$<wgpu_sandbox::start::async_fn$0::async_block_env$0> > at /rustc/897e37553bba8b42751c67658967889d11ecd120\library\core\src\future\mod.rs:91 6: wgpu_sandbox::start::async_fn$0 at .\src\main.rs:120 7: core::future::from_generator::impl$1::poll<enum2$<wgpu_sandbox::start::async_fn_env$0> > at /rustc/897e37553bba8b42751c67658967889d11ecd120\library\core\src\future\mod.rs:91 8: pollster::block_on<core::future::from_generator::GenFuture<enum2$<wgpu_sandbox::start::async_fn_env$0> > > at C:\Users\Gatto.cargo\registry\src\github.com-1ecc6299db9ec823\pollster-0.2.5\src\lib.rs:125 9: wgpu_sandbox::main at .\src\main.rs:134 10: core::ops::function::FnOnce::call_once<void (*)(),tuple$<> > at /rustc/897e37553bba8b42751c67658967889d11ecd120\library\core\src\ops\function.rs:248

Note that before the crash I have a lot of these messages:

[2022-11-29T09:51:10Z INFO wgpu_core::device] Created buffer Valid((6278, 1, Vulkan)) with BufferDescriptor { label: Some("GpuBuffer"), size: 5760, usage: INDEX, mapped_at_creation: true } [2022-11-29T09:51:10Z INFO wgpu_core::device] Created buffer Valid((6279, 1, Vulkan)) with BufferDescriptor { label: Some("GpuBuffer"), size: 128, usage: COPY_DST | VERTEX, mapped_at_creation: true } [2022-11-29T09:51:10Z INFO wgpu_core::device] Created buffer Valid((6280, 1, Vulkan)) with BufferDescriptor { label: Some("GpuBuffer"), size: 11520, usage: INDEX, mapped_at_creation: true } [2022-11-29T09:51:10Z INFO wgpu_core::device] Created buffer Valid((6281, 1, Vulkan)) with BufferDescriptor { label: Some("GpuBuffer"), size: 80, usage: COPY_DST | VERTEX, mapped_at_creation: true } [2022-11-29T09:51:10Z INFO wgpu_core::device] Created texture Valid((20, 1, Vulkan)) with TextureDescriptor { label: Some("crosshair_texture"), size: Extent3d { width: 512, height: 512, depth_or_array_layers: 1 }, mip_level_count: 1, sample_count: 1, dimension: D2, format: Rgba8UnormSrgb, usage: COPY_DST | TEXTURE_BINDING | RENDER_ATTACHMENT } [2022-11-29T09:51:10Z INFO wgpu_core::device] Created texture Valid((21, 1, Vulkan)) with TextureDescriptor { label: Some("from_color texture"), size: Extent3d { width: 1, height: 1, depth_or_array_layers: 1 }, mip_level_count: 1, sample_count: 1, dimension: D2, format: Rgba8Unorm, usage: COPY_DST | TEXTURE_BINDING | RENDER_ATTACHMENT } [2022-11-29T09:51:10Z INFO wgpu_core::device] Created texture Valid((22, 1, Vulkan)) with TextureDescriptor { label: Some("from_color texture"), size: Extent3d { width: 1, height: 1, depth_or_array_layers: 1 }, mip_level_count: 1, sample_count: 1, dimension: D2, format: Rgba8Unorm, usage: COPY_DST | TEXTURE_BINDING | RENDER_ATTACHMENT } [2022-11-29T09:51:10Z INFO wgpu_core::device] Created buffer Valid((6282, 1, Vulkan)) with BufferDescriptor { label: Some("GpuBuffer"), size: 416, usage: VERTEX, mapped_at_creation: true } [2022-11-29T09:51:10Z INFO wgpu_core::device] Created buffer Valid((6283, 1, Vulkan)) with BufferDescriptor { label: Some("GpuBuffer"), size: 12, usage: INDEX, mapped_at_creation: true } [2022-11-29T09:51:10Z INFO wgpu_core::device] Created buffer Valid((6284, 1, Vulkan)) with BufferDescriptor { label: Some("GpuBuffer"), size: 128, usage: COPY_DST | VERTEX, mapped_at_creation: true } [2022-11-29T09:51:10Z INFO wgpu_core::device] Created buffer Valid((6285, 1, Vulkan)) with BufferDescriptor { label: Some("GpuBuffer"), size: 24, usage: INDEX, mapped_at_creation: true } [2022-11-29T09:51:10Z INFO wgpu_core::device] Created buffer Valid((6286, 1, Vulkan)) with BufferDescriptor { label: Some("GpuBuffer"), size: 80, usage: COPY_DST | VERTEX, mapped_at_creation: true }

They take up most of the game's loading time. Make sure you have the log info level active to show them (maybe even the debug level can help give additional information)

CatCode79 commented 1 year ago

2022-11-29_optimization_disable_textures.zip

Wait! I did a cargo update and run in release mode, it works! Frames are between 45 and 50 per second. (The previous test was done in dev mode)

All tests were done with Shadows and Bloom turned off

Davidster commented 1 year ago

Amazing news! Thank you very much for the help :). I'll check out debug mode and your Puffin file to see if I can catch any more issues

CatCode79 commented 1 year ago

Big party tonight! ^_^ It was a great feeling to be able to navigate the level more fluidly.

It occurred to me that there is this crate that implements a compression algorithm for textures (apparently the classic image compression algorithms work fine for storage but not for gpu) https://github.com/BVE-Reborn/ktx2

Davidster commented 1 year ago

glad to hear it :)

yup, texture compression is something we need to add! I never heard of ktx2, sounds like a good option.

Davidster commented 1 year ago

average of 1.3ms for the cpu time on ur machine now, I'm pretty happy with that! I think texture compression and shadow optimization are the two next low-hanging fruit

CatCode79 commented 1 year ago

Nice!

Just today I received the Graphics Programming newsletter which contains a couple of articles on the topic (de)compression and ASTC: https://www.jendrikillner.com/post/graphics-programming-weekly-issue-263/

Another thing of interest: wgpu 0.14.2 is out with a bug fix, I don't think this bug affected us, but I would do a cargo update anyway.

Davidster commented 1 year ago

oh nice, will definitely make the 0.14.2 update asap

CatCode79 commented 1 year ago

teaser Teaser: first screenshot of the tracy integration! It's a huge tool, it's like having to learn to drive a shuttle. Now I have to figure out how to get the information from the GPU...

CatCode79 commented 1 year ago

We have our first list of culprits. I think the acronym MTPC means medium time per call. If you look at the time of the get_next_texture it takes a lot for each frame

Davidster commented 1 year ago

damn 6 microseconds still seems way too slow to compute a node's transform, gotta find a better solution there I think. at least it's no longer a bottleneck.

I wonder if get_next_texure is actually the problem or if it's just waiting on some kind of lock from the graphics API, we might need more fine-grained detail to understand what's happening there

CatCode79 commented 1 year ago

Here's something interesting about it: Document when get_current_texture() etc. block (and provide alternative?) #3283

Davidster commented 1 year ago

I'm getting closer and closer with the frustum culling! Playing around with collision detections and an Oct tree. It seems to be helping a lot with the forest scene, sometimes drawing only 200 objects instead of the 1200 😀. Hope to be able to PR soon, although the holidays won't help with that lol.

Davidster / ikari

Performance issue #3