godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.03k stars 21.18k forks source link

Crash when opening existing project in 3.2.4 beta1 with GLES3 batching (wrong indexing of RasterizerCanvasBatcher::BatchTex) #42994

Closed RPicster closed 4 years ago

RPicster commented 4 years ago

Godot version: 3.2.4 beta1 official

OS/device including version: Windows 10 GeForce RTX 2080/PCIe/SSE2

Issue description: When trying to open my existing 3.2.3 project, the engine crashes before the projects opens with the following error message in the terminal:

Godot Engine v3.2.4.beta1.official - https://godotengine.org OpenGL ES 3.0 Renderer: GeForce RTX 2080/PCIe/SSE2 OpenGL ES Batching: ON

ERROR: get: FATAL: Index p_index = 65535 is out of bounds (size() = 32). At: ./core/cowdata.h:152

akien-mga commented 4 years ago

Are you able to provide either a project that triggers the crash, or compile Godot from source and provide a debugger stacktrace from the crash?

RPicster commented 4 years ago

Creating a project that would trigger the crash is not something I want to try as it could be days of work. The project is really huge and I am also not willing to share it.

If you could direct me to some resources explaining the process of compiling in a way you would need it for a debugger stacktrace, I can try my best to get it running.

akien-mga commented 4 years ago

Here are the instructions to set up a build environment and then compile Godot: https://docs.godotengine.org/en/stable/development/compiling/compiling_for_windows.html

If you already have Visual Studio installed it should be straightforward, if not it can take a while as VS is huge.

I'll backport a recent change we did on the master branch to upload build artifacts from each commit, so I might be able to provide you a build with debug symbols to test with.

akien-mga commented 4 years ago

Here's a build of the Windows editor with debug symbols: https://github.com/godotengine/godot/suites/1382069509/artifacts/22749871 (note: 300 MB).

RPicster commented 4 years ago

I had tons of trouble building the engine (Have to look into it...), so thanks a lot for the build!

Heres the backtrace:

ERROR: CowData<struct RasterizerCanvasBatcher<class RasterizerCanvasGLES3,class RasterizerStorageGLES3>::BatchTex>::get: FATAL: Index p_index = 65535 is out of bounds (size() = 32).
   At: D:\a\godot\godot\core/cowdata.h:152
CrashHandlerException: Program crashed
Dumping the backtrace. Please include this when reporting the bug on https://github.com/godotengine/godot/issues
[0] <couldn't map PC to fn name>
[1] <couldn't map PC to fn name>
[2] <couldn't map PC to fn name>
[3] <couldn't map PC to fn name>
[4] <couldn't map PC to fn name>
[5] <couldn't map PC to fn name>
[6] <couldn't map PC to fn name>
[7] <couldn't map PC to fn name>
[8] <couldn't map PC to fn name>
[9] <couldn't map PC to fn name>
[10] <couldn't map PC to fn name>
[11] <couldn't map PC to fn name>
[12] <couldn't map PC to fn name>
[13] <couldn't map PC to fn name>
[14] <couldn't map PC to fn name>
[15] <couldn't map PC to fn name>
[16] <couldn't map PC to fn name>
[17] BaseThreadInitThunk
-- END OF BACKTRACE --
akien-mga commented 4 years ago

That's weird, the stacktrace doesn't include any information as if debug symbols were stripped :(

Edit: But at least it shows that the crash happens in the GLES3 batching code:

ERROR: CowData<struct RasterizerCanvasBatcher<class RasterizerCanvasGLES3,class RasterizerStorageGLES3>::BatchTex>::get: FATAL: Index p_index = 65535 is out of bounds (size() = 32).
   At: D:\a\godot\godot\core/cowdata.h:152

CC @lawnjelly

RPicster commented 4 years ago

I have no clue, so maybe I am completely wrong... but is it maybe because of this: ERROR: CowData<struct RasterizerCanvasBatcher<class RasterizerCanvasGLES3,class RasterizerStorageGLES3>::BatchTex>::get: FATAL: Index p_index = 65535 is out of bounds (size() = 32). At: D:\a\godot\godot\core/cowdata.h:152 CrashHandlerException: Program crashed

That path is not existing on my machine. Sorry if this is total noobish, but I have no clue about building the engine, just about building my game :D

akien-mga commented 4 years ago

That path is from the buildsystem and hardcoded in the build symbols, but it shouldn't prevent getting a stacktrace normally.

Anyway, even with just this one line, that's a good indication of what's going wrong for @lawnjelly:

lawnjelly commented 4 years ago

Indeed that is good info.

I'll try and have a look over the functions and see if I can work out what might be going wrong. This may have to wait a few days because I'm away from home again, I can only look at the code and cannot compile. But this might be enough info to find the bug.

As usual a minimum reproduction project will help greatly to pin it down (even if I can't run it).

I do have lines such as

    // not sure if needed
    r_fill_state.batch_tex_id = -1;

In the draw batched line routine at least, so this was something I was not sure about at the time. It could possibly be a bug that occurs in rare extreme situations, such as when the buffer is full (try increasing the batch buffer size).

The setting to -1 is a safety measure because it is both ensuring the value is initialized, and also that it will error if used before being properly set (which is the problem here, either it was not set, or it was overrwritten when needing to be preserved).

RPicster commented 4 years ago

Increasing batch buffer size to the maximum had no effect.

Creating a minimum reproduction project is almost impossible for this. My project has hundreds of files and 60 shaders that could be the "culprit".

If you can't find it at home, I will try my best. But until then it would be fantastic if I could save that day of work ;)

lawnjelly commented 4 years ago

Yes don't worry overly I'm sure I will find it especially when I'm back now I know the rough area. It could well be with one of the new primitives - polys, lines, ninepatchrects, if you are using any of these. These weren't batched before, only rects, and the rects have had a fair bit of testing since 3.2.2.

RPicster commented 4 years ago

I use all of those ... at least if you are talking about 2D.

lawnjelly commented 4 years ago

I think I've tracked this down. Pretty sure it is due to -1 being stored as 65535 then being used as the last selected batch texture and loaded, because it is compared to >= 0. There needs to be a check for this condition (65535) or cast to signed uint16 or similar. I can't test this till I get back, but it seems likely.

lawnjelly commented 4 years ago

I can see roughly where this is but I'll need a minimum reproduction project. I've tried to reproduce it but no luck so far.

akien-mga commented 4 years ago

If it's crashing when opening the project, I guess it crashes on something used in the main scene and/or default environment, right?

If so, to help narrow it down @RPicster, you could try to edit project.godot to remove the reference to the main scene and open the project in the editor. Then you could try to open different scenes and see which ones trigger a crash (the main scene most likely does, but I assume if it does it likely instantiates other subscenes, and maybe one of those triggers it and can be extracted as a minimal reproduction project).

lawnjelly commented 4 years ago

You can also add visible = false to the tscn file, to e.g. the root node. Then you should be able to open the scene in the editor, then you can make all the branches / nodes invisible, then unhide the root.

Then by a process of elimination work out which node is causing the crash. You can search through the major branches first, find the branch causing the problem, then the sub branch until you reach the node.

Then save this node as a branch on its own for a minimum reproduction project, or examine it to see what it is drawing so you can reproduce it. Something like that.

This should only take a few mins to pin down.

It is very probably a combination of drawing polys and lines (perhaps with rects too). Unlikely to be nodes that draw rects only. Could well be custom drawing, if you are using that.

RPicster commented 4 years ago

Ok, I think I got the culprit. It surely could be more, but this is definitely crashing. The problem (could be more in my project, but this one is absolutely, 100% reproduceable) is the particles_animation setting inside a CanvasItemMaterial if used on a Particles2D node.

Here is a minimal reproduction project: 3_2_4_ParticleAnimation_crash.zip

akien-mga commented 4 years ago

Thanks for pinning this down! I can reproduce the crash with 150f9ce80, here's a backtrace:

ERROR: get: FATAL: Index p_index = 65535 is out of bounds (size() = 32).
   At: ./core/cowdata.h:152.
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "godot-3.2" received signal SIGILL, Illegal instruction.
0x0000000002097467 in CowData<RasterizerCanvasBatcher<RasterizerCanvasGLES3, RasterizerStorageGLES3>::BatchTex>::get (this=0x58c6018, p_index=65535) at ./core/cowdata.h:152
warning: Source file is more recent than executable.
152                     CRASH_BAD_INDEX(p_index, size());
(gdb) bt
#0  0x0000000002097467 in CowData<RasterizerCanvasBatcher<RasterizerCanvasGLES3, RasterizerStorageGLES3>::BatchTex>::get (this=0x58c6018, p_index=65535) at ./core/cowdata.h:152
#1  0x00000000020905f4 in Vector<RasterizerCanvasBatcher<RasterizerCanvasGLES3, RasterizerStorageGLES3>::BatchTex>::operator[] (this=0x58c6010, p_index=65535) at ./core/vector.h:85
#2  0x000000000208ce9a in RasterizerArray_non_pod<RasterizerCanvasBatcher<RasterizerCanvasGLES3, RasterizerStorageGLES3>::BatchTex>::operator[] (this=0x58c6010, ui=65535)
    at ./drivers/gles_common/rasterizer_array.h:226
#3  0x00000000020960a3 in RasterizerCanvasBatcher<RasterizerCanvasGLES3, RasterizerStorageGLES3>::_translate_batches_to_larger_FVF<RasterizerCanvasBatcher<RasterizerCanvasGLES3, RasterizerStorageGLES3>::BatchVertexLarge, true, true, true> (this=0x58c5f60) at ./drivers/gles_common/rasterizer_canvas_batcher.h:2691
#4  0x0000000002090177 in RasterizerCanvasBatcher<RasterizerCanvasGLES3, RasterizerStorageGLES3>::flush_render_batches (this=0x58c5f60, p_first_item=0xb48fa90, p_current_clip=0x0, 
    r_reclip=@0x7ffffffdc317: false, p_material=0x0) at ./drivers/gles_common/rasterizer_canvas_batcher.h:2251
#5  0x000000000208c997 in RasterizerCanvasBatcher<RasterizerCanvasGLES3, RasterizerStorageGLES3>::render_joined_item_commands (this=0x58c5f60, p_bij=..., p_current_clip=0x0, r_reclip=@0x7ffffffdc317: false, 
    p_material=0x0, p_lit=false) at ./drivers/gles_common/rasterizer_canvas_batcher.h:2347
#6  0x00000000020891ac in RasterizerCanvasGLES3::render_joined_item (this=0x58c5a00, p_bij=..., r_ris=...) at drivers/gles3/rasterizer_canvas_gles3.cpp:1469
#7  0x000000000208a768 in RasterizerCanvasGLES3::canvas_render_items_implementation (this=0x58c5a00, p_item_list=0x0, p_z=0, p_modulate=..., p_light=0x0, p_base_transform=...)
    at drivers/gles3/rasterizer_canvas_gles3.cpp:1944
#8  0x000000000208c45f in RasterizerCanvasBatcher<RasterizerCanvasGLES3, RasterizerStorageGLES3>::batch_canvas_render_items_end (this=0x58c5f60) at ./drivers/gles_common/rasterizer_canvas_batcher.h:752
#9  0x0000000002082b34 in RasterizerCanvasGLES3::canvas_render_items_end (this=0x58c5a00) at drivers/gles3/rasterizer_canvas_gles3.cpp:59
#10 0x00000000037e225f in VisualServerCanvas::render_canvas (this=0x5787190, p_canvas=0x6ed6d30, p_transform=..., p_lights=0x0, p_masked_lights=0x0, p_clip_rect=..., p_canvas_layer_id=0)
    at servers/visual/visual_server_canvas.cpp:273
#11 0x00000000037157aa in VisualServerViewport::_draw_viewport (this=0x588a2c0, p_viewport=0x807cee0, p_eye=ARVRInterface::EYE_MONO) at servers/visual/visual_server_viewport.cpp:241
#12 0x0000000003715ef5 in VisualServerViewport::draw_viewports (this=0x588a2c0) at servers/visual/visual_server_viewport.cpp:344
#13 0x00000000036e8029 in VisualServerRaster::draw (this=0x588bb80, p_swap_buffers=true, frame_step=1.7648379802703857) at servers/visual/visual_server_raster.cpp:108
#14 0x000000000371c447 in VisualServerWrapMT::draw (this=0x56bb080, p_swap_buffers=true, frame_step=1.7648379802703857) at servers/visual/visual_server_wrap_mt.cpp:102
#15 0x0000000001481133 in Main::iteration () at main/main.cpp:2120
#16 0x000000000144ef4c in OS_X11::run (this=0x7fffffffce10) at platform/x11/os_x11.cpp:3374
#17 0x000000000143e74c in main (argc=3, argv=0x7fffffffd668) at platform/x11/godot_x11.cpp:56
lawnjelly commented 4 years ago

Fantastic! :grin: I'll get right on it!

I never would have found this by trial and error. :smile: It only occurs in the editor and occurs because the editor is drawing a line (for debug maybe?) after the particle command, perhaps in the same item. The particle is wrongly triggering it to switch to large_fvf mode because it has found a custom shader. Particle commands are not batched so should not affect the FVF. Hopefully easy enough to fix.

There will probably be a few teething problems like this with large_fvf, but it will really be worth getting over them, because batched custom shaders is one of the major advances in unified batching, and will accelerate lots more games.

RPicster commented 4 years ago

That's good news :)

I'm looking forward to the next version, I think I use pretty much the whole arsenal that Godot has to offer, so I am happy to test everything out :)

akien-mga commented 4 years ago

Fixed by #43102.