Open verypleasentusername opened 1 year ago
Can you please try this with a supported version, like 4.1.3 (only the latest patch version is supported, and it might already be fixed)
Can you please try this with a supported version, like 4.1.3 (only the latest patch version is supported, and it might already be fixed)
isn't the 4.1 branch on github the 4.1.3 one? if not i will surely try out 4.1.3.
Yes but you said "4.1 stable", which means "4.1.0", an old version, if you are referencing a branch please add the commit hash as per the instructions in the bug report form 🙂
Yes but you said "4.1 stable", which means "4.1.0", an old version, if you are referencing a branch please add the commit hash as per the instructions in the bug report form 🙂
updated
(Do add the commit hash, right now it is [2d3b2ab], but what ever you have, as "latest" will be outdated in the future and can make it harder to check this)
Since you are compiling locally, would you mind testing on master
as well? Either way is okay but would see if it is a bug that has been solved and might be cherry picked
(Do add the commit hash, right now it is [2d3b2ab], but what ever you have, as "latest" will be outdated in the future and can make it harder to check this)
did i make it right? sorry, my first big issue here
You did 🙂 This might be a known issue, can't remember what other issue report it was, might be a different cause or issue though as I couldn't find it right now
Since you are compiling locally, would you mind testing on
master
as well? Either way is okay but would see if it is a bug that has been solved and might be cherry picked
the problem is, it probably will work on master, but because the error itself is pretty random. Even just now, i was able to make to work by deleting and them adding meshes. Even if it does work on master there is no reason to think that error was fixed. although i hope so. I will try newer release and then i update on results here.
quick update, i tried again and it failed again, but now at 62% baking lights (baking probes).
Console showed Vulkan errors again but now there was a couple of error saying Out of Memory!
, which makes me think that godot's lighmapGI is trying to push baking faster than it can and goes beyond memory limit. Thats actually weird cause rendering process should never be forced faster, and in most sofware it isn't. Its all just a speculation though.
Which graphics card model are you using?
Also, please upload a minimal reproduction project[^1] to make this easier to troubleshoot.
[^1]: A small Godot project which reproduces the issue, with no unnecessary files included. Be sure to not include the .godot
folder in the archive (but keep project.godot
).
Drag and drop a ZIP archive to upload it. Do not select another field until the project is done uploading.
Note for C# users: If your issue is not Mono-specific, please upload a minimal reproduction project written in GDScript or VisualScript. This will make it easier for contributors to reproduce the issue locally as not everyone has a Mono setup available.
Which graphics card model are you using?
Also, please upload a minimal reproduction project1 to make this easier to troubleshoot.
Footnotes
- A small Godot project which reproduces the issue, with no unnecessary files included. Be sure to not include the
.godot
folder in the archive (but keepproject.godot
).Drag and drop a ZIP archive to upload it. Do not select another field until the project is done uploading.**Note for C# users:** If your issue is not Mono-specific, please upload a minimal reproduction project written in GDScript or VisualScript. This will make it easier for contributors to reproduce the issue locally as not everyone has a Mono setup available. ↩
my graphics core: AMD Radeon(TM) Vega 8 Graphics
. Issue might be computer power related, in that case name of the issue should be changed.
I finally managed to recreate issue on minimal reproduction project, updated initial comment. Again, might work fine on your computer. In that case its 99% computing power related.
In the MRP, lightmaps bake in 1 second with the denoiser enabled on my i9-13900K + RTX 4090 setup, so this is definitely hardware-specific. It could also be a driver bug.
How much system RAM do you have? The amount of video memory you can use with integrated graphics is determined by the amount of system RAM.
In the MRP, lightmaps bake in 1 second with the denoiser enabled on my i9-13900K + RTX 4090 setup, so this is definitely hardware-specific. It could also be a driver bug.
Uhh.. what is MRP? google shows weird answers.
How much system RAM do you have? The amount of video memory you can use with integrated graphics is determined by the amount of system RAM.
4 gb. It might seem funny, trying to bake on such a low-spec computer, however light-releated operations in such software as Blender isn't forced, and works fine for me. Just slow, as it should be when baking lights in Godot. I was able to avoid crashes by tweaking settings Render->Lighmapper
and it might be ment to be that way. In that case it's again another issue.
Uhh.. what is MRP? google shows weird answers.
MRP stands for Minimal reproduction project.
4 gb. It might seem funny, trying to bake on such a low-spec computer, however light-releated operations in such software as Blender isn't forced, and works fine for me. Just slow, as it should be when baking lights in Godot. I was able to avoid crashes by tweaking settings Render->Lighmapper and it might be ment to be that way. In that case it's again another issue.
Which exact settings did you tweak to get it to work?
Same GPU vega 8 but system memory is 8G. Godot-v4.2.beta5. Baking the MPR project provided by the author reports the following error and freezes at 50% (direct light baking process):
Vulkan: Device lost!
ERROR: Condition "err" is true.
at: local_device_push_command_buffers (drivers/vulkan/vulkan_context.cpp:2796)
ERROR: Condition "!ld->waiting" is true.
at: local_device_sync (drivers/vulkan/vulkan_context.cpp:2803)
ERROR: Condition "err" is true. Returning: ERR_CANT_CREATE
at: _update_swap_chain (drivers/vulkan/vulkan_context.cpp:2135)
ERROR: Vulkan: Cannot submit graphics queue. Error code: VK_ERROR_DEVICE_LOST
at: (drivers/vulkan/vulkan_context.cpp:2536)
Vulkan: Device lost!
ERROR: Condition "err" is true.
at: local_device_push_command_buffers (drivers/vulkan/vulkan_context.cpp:2796)
ERROR: Condition "!ld->waiting" is true.
at: local_device_sync (drivers/vulkan/vulkan_context.cpp:2803)
10 Times
ERROR: Condition "err" is true.
at: local_device_push_command_buffers (drivers/vulkan/vulkan_context.cpp:2796)
ERROR: Condition "!ld->waiting" is true.
at: local_device_sync (drivers/vulkan/vulkan_context.cpp:2803)
ERROR: Condition "err" is true. Returning: ERR_CANT_CREATE
at: _update_swap_chain (drivers/vulkan/vulkan_context.cpp:2135)
After testing, changing the angular distance
of the first DirectionalLight node from 15°
back to the default value of 0°
, it works.
After testing, changing the angular distance of the first DirectionalLight node from 15° back to the default value of 0°, it works.
My guess is that this (very high) angular distance causes too many rays to be thrown or allocated. There should probably be a upper clamp on the angular distance in the inspector and/or the lightmapper, or sample count should be clamped to a maximum value so that higher values can be used without using too much memory (at the cost of having some visible banding).
Typical angular distance values are between 0° and 3° for real world renderings.
cc @DarioSamo
My guess is that this (very high) angular distance causes too many rays to be thrown or allocated. There should probably be a upper clamp on the angular distance in the inspector and/or the lightmapper, or sample count should be clamped to a maximum value so that higher values can be used without using too much memory (at the cost of having some visible banding).
Typical angular distance values are between 0° and 3° for real world renderings.
cc @DarioSamo
these are the settings changes i made and was able to avoid freezes with them(even speed up the baking by 2 times):
(first one was lowered by just one (was 5 initially) second two were divided by 2).
Also a note that i used high angular distance to mimic light scattering in the clouds and making blobby shadows.
My guess is that this (very high) angular distance causes too many rays to be thrown or allocated
aren't rays emittet a fixed amout by pixel(ignoring such settings as angular distance)? Please feel free to correct me if im wrong.
Same GPU vega 8 but system memory is 8G. Godot-v4.2.beta5. Baking the MPR project provided by the author reports the following error and freezes at 50% (direct light baking process):
Vulkan: Device lost! ERROR: Condition "err" is true. at: local_device_push_command_buffers (drivers/vulkan/vulkan_context.cpp:2796) ERROR: Condition "!ld->waiting" is true. at: local_device_sync (drivers/vulkan/vulkan_context.cpp:2803) ERROR: Condition "err" is true. Returning: ERR_CANT_CREATE at: _update_swap_chain (drivers/vulkan/vulkan_context.cpp:2135) ERROR: Vulkan: Cannot submit graphics queue. Error code: VK_ERROR_DEVICE_LOST at: (drivers/vulkan/vulkan_context.cpp:2536) Vulkan: Device lost! ERROR: Condition "err" is true. at: local_device_push_command_buffers (drivers/vulkan/vulkan_context.cpp:2796) ERROR: Condition "!ld->waiting" is true. at: local_device_sync (drivers/vulkan/vulkan_context.cpp:2803) 10 Times ERROR: Condition "err" is true. at: local_device_push_command_buffers (drivers/vulkan/vulkan_context.cpp:2796) ERROR: Condition "!ld->waiting" is true. at: local_device_sync (drivers/vulkan/vulkan_context.cpp:2803) ERROR: Condition "err" is true. Returning: ERR_CANT_CREATE at: _update_swap_chain (drivers/vulkan/vulkan_context.cpp:2135)
After testing, changing the
angular distance
of the first DirectionalLight node from15°
back to the default value of0°
, it works.)
Can you test settings changes i showed above(with angular distance: 15°
)? It would be helpful to know if fix can be recreated as well.
Running long compute jobs on weak hardware is pretty much a recipe for trouble if you don't disable the TDR. "Region Size" is probably the setting you want to mess with to significantly reduce the amount of work that will be dispatched on each compute call to run below the timeout threshold.
About anything else mentioned so far sounds pretty irrelevant to me to be honest, it just sounds like you're on the very edge of the timeout so it'll randomly work or not depending on the complexity of the scene.
Running long compute jobs on weak hardware is pretty much a recipe for trouble if you don't disable the TDR. "Region Size" is probably the setting you want to mess with to significantly reduce the amount of work that will be dispatched on each compute call to run below the timeout threshold.
About anything else mentioned so far sounds pretty irrelevant to me to be honest, it just sounds like you're on the very edge of the timeout so it'll randomly work or not depending on the complexity of the scene.
Running long compute jobs on weak hardware is pretty much a recipe for trouble if you don't disable the TDR. "Region Size" is probably the setting you want to mess with to significantly reduce the amount of work that will be dispatched on each compute call to run below the timeout threshold.
About anything else mentioned so far sounds pretty irrelevant to me to be honest, it just sounds like you're on the very edge of the timeout so it'll randomly work or not depending on the complexity of the scene.
why does timeout exists anyway? and why do compute calls have them?
why does timeout exists anyway? and why do compute calls have them?
The timeout is at the driver level and pretty much for any GPU work, not just compute. We don't really control it from Godot's side, we can just make some estimates as to how much work should be dispatched, but obviously the amount we choose isn't gonna take the same on all hardware.
why does timeout exists anyway? and why do compute calls have them?
The timeout is at the driver level and pretty much for any GPU work, not just compute. We don't really control it from Godot's side, we can just make some estimates as to how much work should be dispatched, but obviously the amount we choose isn't gonna take the same on all hardware.
Makes sense. So its not an issue anymore? maybe documentation should be changes as its not very clear that Lighmapper settings should be configured individually for a specific computer.
The most optimal way to avoid the issue is to increase the TDR duration, but this requires editing the registry with administrator privileges. We can provide a .reg
file for doing so (or even make Godot execute the required task using PowerShell code when requested), but it won't be usable in every case.
This approach is also used by software like Substance Painter, which warns you on startup if the TDR isn't increased.
PS: This is a non-issue on Linux (and possibly macOS), since they don't have a concept of TDR in the first place. Drivers can happily hang forever there :slightly_smiling_face:
Running long compute jobs on weak hardware is pretty much a recipe for trouble if you don't disable the TDR. "Region Size" is probably the setting you want to mess with to significantly reduce the amount of work that will be dispatched on each compute call to run below the timeout threshold.
After set TdrLevel to 0 , and set TdrDelay to 1000, then reboot pc,it does complete baking when the angular distance is 15°.
PS: Disable TDR link
After set TdrLevel to 0 , and set TdrDelay to 1000, then reboot pc,it does complete baking when the angular distance is 15°.
PS: Disable TDR link
Yep, sounds about what I expected.
While this is an issue I think we're safe to close this and boil it down to some general proposal instead of how we could handle this behavior. There are some different approaches that could work (e.g. running small benchmarks with incremental region sizes until it reaches a safe amount of time) but it's very much an area where there's no universal solution to fix it due to the APIs giving no control over this timeout.
While this is an issue I think we're safe to close this and boil it down to some general proposal instead of how we could handle this behavior. There are some different approaches that could work (e.g. running small benchmarks with incremental region sizes until it reaches a safe amount of time) but it's very much an area where there's no universal solution to fix it due to the APIs giving no control over this timeout.
Could we default to a lower region size on integrated graphics automatically? Vulkan reports the device type via RenderingServer.get_video_adapter_type()
.
I suppose this will be best implemented once we have support for a low_end_gpu
feature tag, so it can be added as a project setting feature tag override. This is something I discussed with reduz recently, so it should be good to implement.
Could we default to a lower region size on integrated graphics automatically? Vulkan reports the device type via
RenderingServer.get_video_adapter_type()
.
I want to note that i did make region_size
of a really small number,
to be exact. Output still was loaded with lots of erros and crashed.
When i tried region_size of 2 same happened. Deleting .godot
cache folder did not change result.
either region_size
property doesn't really work or error is not caused by region_size
value at all. though i must note that bake percentage was going more and more slower as i decreased region_size
value.
trouble might be caused by corrutped model or >2 same models being baked.
Also, what src_tex
error means anyway?
I was able to successfully render a complex scene without crashes, which makes me think my problem is absolutely not computing-power related.
sometimes 4 house models render works sometimes not, but region_size
changes does not help it.
I was able to successfully render a complex scene without crashes, which makes me think my problem is absolutely not computing-power related.
You're not approaching this test the right way in that case. You should look into if disabling TDR to confirm if it's computing-power related. If you're getting the same error the other user reported then it's absolutely related to that. If it's something else then you'd get a different kind of error than just device lost.
What errors are you getting on output when it crashes?
You're not approaching this test the right way in that case. You should look into if disabling TDR to confirm if it's computing-power related. If you're getting the same error the other user reported then it's absolutely related to that. If it's something else then you'd get a different kind of error than just device lost.
What errors are you getting on output when it crashes?
i see why TDR is brought up again but i want to clearify that i did set it to 60 and random freezes no more happen. I also want to clearify that "DEVICE LOST"
error is nowhere to be seen and im at 4.2.2 version of godot rn. I made a new issue about all this changes and how errors with lightmapping still appear but i was redirected to this issue by @Calinou which is fair.
what happens is errors and progress-stop caused by them. These errors are:
and "src_tex" is missing
a moment later tons of UNIFORM SET
errors spawn. sometimes they dont, and at that times no crash happens, but the progress still stops. I still dont know why this error happens with house models but not with subdivided cube model. Maybe somethigs in it makes godot hiccup, i dunno. I lost all my nerves on this error.
@verypleasentusername Yeah it seems your original post lacked that information so it wasn't possible to determine what the actual cause was.
-2 is VK_ERROR_OUT_OF_DEVICE_MEMORY. Do you perhaps not have enough video memory to run the baker?
Admittedly it can do a better job at deleting resources if it doesn't need them while it's baking, but that is sounding like the cause here.
@verypleasentusername Yeah it seems your original post lacked that information so it wasn't possible to determine what the actual cause was.
-2 is VK_ERROR_OUT_OF_DEVICE_MEMORY. Do you perhaps not have enough video memory to run the baker?
Admittedly it can do a better job at deleting resources if it doesn't need them while it's baking, but that is sounding like the cause here.
it looks like errors are actually texture size caused. when i tried making lightmap size-value of subdivided cubes(4) a 2000x2000 pixels it crashed with the same error.
is there any way to avoid this? use pre-made texture maybe? or at least is there a way to automatically divide tuxture size by any number(If i remember correctly @Calinou had simillar idea somewhere in proposals, something like preview bake)?
Or is it max_texture_size
in LightmapGI is what causing an error? cant it be set higher than 16384?
is there any way to avoid this?
The mesh's texel size controls directly what the size of the resulting lightmap will be, you can look up the documentation of Lightmap of how to change this.
I feel whatever problem you're running into means you're just hitting the upper bound of what your video memory allows while baking. Like I said, there might be ways into looking to minimize this as much as possible, but that'd depend on how reasonable the costs are here vs what the system allows. What's your total VRAM?
The mesh's texel size controls directly what the size of the resulting lightmap will be, you can look up the documentation of Lightmap of how to change this.
I feel whatever problem you're running into means you're just hitting the upper bound of what your video memory allows while baking. Like I said, there might be ways into looking to minimize this as much as possible, but that'd depend on how reasonable the costs are here vs what the system allows. What's your total VRAM?
not... much (second line) i thought baking light uses RAM instead.
i thought baking light uses RAM instead.
Nope, it's a GPU baker, so it uses compute shaders and video memory textures. I don't think you'll get very far with it if you're limited on resources like that.
There could be more work on the engine's side to minimize the amount of memory used (and also be clearer what the true cause of the error is), but that is a pretty painfully low amount of memory to work with.
i thought baking light uses RAM instead.
Nope, it's a GPU baker, so it uses compute shaders and video memory textures. I don't think you'll get very far with it if you're limited on resources like that.
There could be more work on the engine's side to minimize the amount of memory used (and also be clearer what the true cause of the error is), but that is a pretty painfully low amount of memory to work with.
wow thats awful. i thought integrated graphics are somewhat ok with it? at least i have not experienced troubles working in programs like blender, but maybe the problem is really about tuxture type optimization and errors clearity.
I still dont know if keeping texture on disk and using VRAM only to render rays is an option to make? i know its too small of a feature for devs to bother making but i want to know if this even possible to achieve. Or is there any way to add on VRAM by memory-swapping or should i ask other people to render scenes?
I still dont know if keeping texture on disk and using VRAM only to render rays is an option to make? i know its too small of a feature for devs to bother making but i want to know if this even possible to achieve. Or is there any way to add on VRAM by memory-swapping or should i ask other people to render scenes?
Keeping it on disk would not work, it only loads what it requires at a time for rendering the lightmaps, which consists of quite a few versions of the same texture at full size (diffuse, normals, light accumulation, etc). All of that has to be in VRAM to be able to render it. It can probably be looked into at some point if there's some potential savings or inefficiencies, but you can also probably just increase the amount of RAM your system dedicates to video memory on your BIOS.
Godot version
4.1 branch [2d3b2ab], compiled locally
System information
windows 10, 4.1 stable engine version, Forward+
Issue description
If Denoiser is enabled, the progress goes up to 68% and editor freezes indefinitely. Console shows this errors.
Steps to reproduce
Open scene_3d.tscn, choose LightmapGI in scene nodes list and click "Bake Lighmaps".
Minimal reproduction project
LightmapError.zip