godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.5k stars 21.26k forks source link

Game Crash with AMD Processors #86681

Open LemonadeFlashbang opened 11 months ago

LemonadeFlashbang commented 11 months ago

Tested versions

Godot 3.5.3 custom

System information

AMD Ryzen 7, AMD Ryzen 5

Issue description

Summary My game crashes when run on computers with AMD Ryzen cards.

Reducing the filesize, reverting to single threaded loading, and compatibility modes have failed to fix the issue. I believe the root cause is an engine incompatibility with AMD hardware.

Where the game crashes is variable for different computers. Most machines crash when the game is loaded and the title screen appears, but I've heard of a single case where a user crashes when the main map is instanced (after character select).

Removing the number of objects from the game's initial load will allow it to run on these machines, but it will then crash when trying to instance the next scenes.

Full Context I'm the developer behind Doomsday Paradise. In November, I launched my game- and a user reported the game was crashing on startup. The user sent a list of specs over, which look like the following: ryzen1 ryzen2

The user was on an older laptop, and eventually switched devices. The new device had no issues. At the time, I thought the issue might be RAM related.

Since then 3 other users have approached about game crashes. Mostly on game start. Every single one of them is using an AMD Ryzen processor. Specs attached. Here is a discussion thread on the Steam Discussion Boards about the issue.

ryzen5 ryzen6 ryzen3 ryzen4

While most devices have only 8 GB of RAM, one device has 32 GB. The game shouldn't need more than about 300 MB of RAM. The VRAM requirement is higher- a little over 3 GB of VRAM.

In the Discussion Board I've tried my best to isolate the actual source by having some users test modified game builds. Things we've tried:

The issue seems to be related to object instancing, but because machines are crashing at different points I can't replicate it. In addition, I don't have a machine with an AMD Ryzen card- so I'm not easily able to create a repro project.

For purposes of debugging this, I can supply a game key. Alternatively, if anyone can help me figure out what logging / testing steps to take next, I'm happy to perform the relevant diagnostics myself and have a couple users who have been extremely patient and willing to help.

Currently, no issues appear in the logs. Users just experience a sudden termination.

Steps to reproduce

  1. Open my game with a computer using an AMD Ryzen card

Minimal reproduction project (MRP)

Not available since I can't locally test it. I can try to reproduce a project with the current size profile if necessary.

During the title screen, the following are the game's stats:

rsubtil commented 11 months ago

Can you try compilling a debug template export and use it when exporting the game in debug mode? It may provide you with some more useful information.

lawnjelly commented 11 months ago

What is in your custom build (i.e. what changes are there versus vanilla Godot?)? How was it compiled, what SCons args did you use? Are you compiling in any SSE / special instructions that might not be present on these CPUs? Any shared libraries that might not be present?

Usually different CPU do not cause crashes. GPUs on the other hand often do, especially if it is trying to use integrated graphics etc. Are you using custom shaders?

Agree with @rsubtil that compiling debug export template and running on the offending hardware will probably be quickest to indicate the bug, as bisecting by detective work can be quite tricky.

UPDATE: I also noticed on Page 3 of the thread there is a quote with a shader error:

``` 1 | // NOTE: Shader automatically converted from Godot Engine 3.5.3.rc's ParticlesMaterial. 2 | 3 | shader_type particles; 4 | uniform vec3 direction; 5 | uniform float spread; 6 | uniform float flatness; 7 | uniform float initial_linear_velocity; 8 | uniform float initial_angle; 9 | uniform float angular_velocity; 10 | uniform float orbit_velocity; 11 | uniform float linear_accel; 12 | uniform float radial_accel; 13 | uniform float tangent_accel; 14 | uniform float damping; 15 | uniform float scale; 16 | uniform float hue_variation; 17 | uniform float anim_speed; 18 | uniform float anim_offset; 19 | uniform float initial_linear_velocity_random; 20 | uniform float initial_angle_random; 21 | uniform float angular_velocity_random; 22 | uniform float orbit_velocity_random; 23 | uniform float linear_accel_random; 24 | uniform float radial_accel_random; 25 | uniform float tangent_accel_random; 26 | uniform float damping_random; 27 | uniform float scale_random; 28 | uniform float hue_variation_random; 29 | uniform float anim_speed_random; 30 | uniform float anim_offset_random; 31 | uniform float lifetime_randomness; 32 | uniform float emission_sphere_radius; 33 | uniform vec4 color_value : hint_color; 34 | uniform int trail_divisor; 35 | uniform vec3 gravity; 36 | 37 | 38 | float rand_from_seed(inout uint seed) { 39 | int k; 40 | int s = int(seed); 41 | if (s == 0) 42 | s = 305420679; 43 | k = s / 127773; 44 | s = 16807 * (s - k * 127773) - 2836 * k; 45 | if (s < 0) 46 | s += 2147483647; 47 | seed = uint(s); 48 | return float(seed % uint(65536)) / 65535.0; 49 | } 50 | 51 | float rand_from_seed_m1_p1(inout uint seed) { 52 | return rand_from_seed(seed) * 2.0 - 1.0; 53 | } 54 | 55 | uint hash(uint x) { 56 | x = ((x >> uint(16)) ^ x) * uint(73244475); 57 | x = ((x >> uint(16)) ^ x) * uint(73244475); 58 | x = (x >> uint(16)) ^ x; 59 | return x; 60 | } 61 | 62 | void vertex() { 63 | uint base_number = NUMBER / uint(trail_divisor); 64 | uint alt_seed = hash(base_number + uint(1) + RANDOM_SEED); 65 | float angle_rand = rand_from_seed(alt_seed); 66 | float scale_rand = rand_from_seed(alt_seed); 67 | float hue_rot_rand = rand_from_seed(alt_seed); 68 | float anim_offset_rand = rand_from_seed(alt_seed); 69 | float pi = 3.14159; 70 | float degree_to_rad = pi / 180.0; 71 | 72 | bool restart = false; 73 | float tv = 0.0; 74 | if (CUSTOM.y > CUSTOM.w) { 75 | restart = true; 76 | tv = 1.0; 77 | } 78 | 79 | if (RESTART || restart) { 80 | uint alt_restart_seed = hash(base_number + uint(301184) + RANDOM_SEED); 81 | float tex_linear_velocity = 0.0; 82 | float tex_angle = 0.0; 83 | float tex_anim_offset = 0.0; 84 | float spread_rad = spread * degree_to_rad; 85 | { 86 | float angle1_rad = rand_from_seed_m1_p1(alt_restart_seed) * spread_rad; 87 | angle1_rad += direction.x != 0.0 ? atan(direction.y, direction.x) : sign(direction.y) * (pi / 2.0); 88 | vec3 rot = vec3(cos(angle1_rad), sin(angle1_rad), 0.0); 89 | VELOCITY = rot * initial_linear_velocity * mix(1.0, rand_from_seed(alt_restart_seed), initial_linear_velocity_random); 90 | } 91 | float base_angle = (initial_angle + tex_angle) * mix(1.0, angle_rand, initial_angle_random); 92 | CUSTOM.x = base_angle * degree_to_rad; 93 | CUSTOM.y = 0.0; 94 | CUSTOM.w = (1.0 - lifetime_randomness * rand_from_seed(alt_restart_seed)); 95 | CUSTOM.z = (anim_offset + tex_anim_offset) * mix(1.0, anim_offset_rand, anim_offset_random); 96 | float s = rand_from_seed(alt_restart_seed) * 2.0 - 1.0; 97 | float t = rand_from_seed(alt_restart_seed) * 2.0 * pi; 98 | float radius = emission_sphere_radius * sqrt(1.0 - s * s); 99 | TRANSFORM[3].xyz = vec3(radius * cos(t), radius * sin(t), emission_sphere_radius * s); 100 | VELOCITY = (EMISSION_TRANSFORM * vec4(VELOCITY, [0.0)).xyz;](https://steamcommunity.com/linkfilter/?u=http%3A%2F%2F0.0%29%29.xyz%3B) 101 | TRANSFORM = EMISSION_TRANSFORM * TRANSFORM; 102 | VELOCITY.z = 0.0; 103 | TRANSFORM[3].z = 0.0; 104 | } else { 105 | CUSTOM.y += DELTA / LIFETIME; 106 | tv = CUSTOM.y / CUSTOM.w; 107 | float tex_linear_velocity = 0.0; 108 | float tex_orbit_velocity = 0.0; 109 | float tex_angular_velocity = 0.0; 110 | float tex_linear_accel = 0.0; 111 | float tex_radial_accel = 0.0; 112 | float tex_tangent_accel = 0.0; 113 | float tex_damping = 0.0; 114 | float tex_angle = 0.0; 115 | float tex_anim_speed = 0.0; 116 | float tex_anim_offset = 0.0; 117 | vec3 force = gravity; 118 | vec3 pos = TRANSFORM[3].xyz; 119 | pos.z = 0.0; 120 | // apply linear acceleration 121 | force += length(VELOCITY) > 0.0 ? normalize(VELOCITY) * (linear_accel + tex_linear_accel) * mix(1.0, rand_from_seed(alt_seed), linear_accel_random) : vec3(0.0); 122 | // apply radial acceleration 123 | vec3 org = EMISSION_TRANSFORM[3].xyz; 124 | vec3 diff = pos - org; 125 | force += length(diff) > 0.0 ? normalize(diff) * (radial_accel + tex_radial_accel) * mix(1.0, rand_from_seed(alt_seed), radial_accel_random) : vec3(0.0); 126 | // apply tangential acceleration; 127 | force += length(diff.yx) > 0.0 ? vec3(normalize(diff.yx * vec2(-1.0, 1.0)), 0.0) * ((tangent_accel + tex_tangent_accel) * mix(1.0, rand_from_seed(alt_seed), tangent_accel_random)) : vec3(0.0); 128 | // apply attractor forces 129 | VELOCITY += force * DELTA; 130 | // orbit velocity 131 | float orbit_amount = (orbit_velocity + tex_orbit_velocity) * mix(1.0, rand_from_seed(alt_seed), orbit_velocity_random); 132 | if (orbit_amount != 0.0) { 133 | float ang = orbit_amount * DELTA * pi * 2.0; 134 | mat2 rot = mat2(vec2(cos(ang), -sin(ang)), vec2(sin(ang), cos(ang))); 135 | TRANSFORM[3].xy -= diff.xy; 136 | TRANSFORM[3].xy += rot * diff.xy; 137 | } 138 | if (damping + tex_damping > 0.0) { 139 | float v = length(VELOCITY); 140 | float damp = (damping + tex_damping) * mix(1.0, rand_from_seed(alt_seed), damping_random); 141 | v -= damp * DELTA; 142 | if (v < 0.0) { 143 | VELOCITY = vec3(0.0); 144 | } else { 145 | VELOCITY = normalize(VELOCITY) * v; 146 | } 147 | } 148 | float base_angle = (initial_angle + tex_angle) * mix(1.0, angle_rand, initial_angle_random); 149 | base_angle += CUSTOM.y * LIFETIME * (angular_velocity + tex_angular_velocity) * mix(1.0, rand_from_seed(alt_seed) * 2.0 - 1.0, angular_velocity_random); 150 | CUSTOM.x = base_angle * degree_to_rad; 151 | CUSTOM.z = (anim_offset + tex_anim_offset) * mix(1.0, anim_offset_rand, anim_offset_random) + tv * (anim_speed + tex_anim_speed) * mix(1.0, rand_from_seed(alt_seed), anim_speed_random); 152 | } E 153-> float tex_scale = textureLod(scale_texture, vec2(tv, 0.0), 0.0).r; 154 | float tex_hue_variation = 0.0; 155 | float hue_rot_angle = (hue_variation + tex_hue_variation) * pi * 2.0 * mix(1.0, hue_rot_rand * 2.0 - 1.0, hue_variation_random); 156 | float hue_rot_c = cos(hue_rot_angle); 157 | float hue_rot_s = sin(hue_rot_angle); 158 | mat4 hue_rot_mat = mat4(vec4(0.299, 0.587, 0.114, 0.0), 159 | vec4(0.299, 0.587, 0.114, 0.0), 160 | vec4(0.299, 0.587, 0.114, 0.0), 161 | vec4(0.000, 0.000, 0.000, 1.0)) + 162 | mat4(vec4(0.701, -0.587, -0.114, 0.0), 163 | vec4(-0.299, 0.413, -0.114, 0.0), 164 | vec4(-0.300, -0.588, 0.886, 0.0), 165 | vec4(0.000, 0.000, 0.000, 0.0)) * hue_rot_c + 166 | mat4(vec4(0.168, 0.330, -0.497, 0.0), 167 | vec4(-0.328, 0.035, 0.292, 0.0), 168 | vec4(1.250, -1.050, -0.203, 0.0), 169 | vec4(0.000, 0.000, 0.000, 0.0)) * hue_rot_s; 170 | COLOR = hue_rot_mat * color_value; 171 | 172 | TRANSFORM[0] = vec4(cos(CUSTOM.x), -sin(CUSTOM.x), 0.0, 0.0); 173 | TRANSFORM[1] = vec4(sin(CUSTOM.x), cos(CUSTOM.x), 0.0, 0.0); 174 | TRANSFORM[2] = vec4(0.0, 0.0, 1.0, 0.0); 175 | float base_scale = tex_scale * mix(scale, 1.0, scale_random * scale_rand); 176 | if (base_scale < 0.000001) { 177 | base_scale = 0.000001; 178 | } 179 | TRANSFORM[0].xyz *= base_scale; 180 | TRANSFORM[1].xyz *= base_scale; 181 | TRANSFORM[2].xyz *= base_scale; 182 | VELOCITY.z = 0.0; 183 | TRANSFORM[3].z = 0.0; 184 | if (CUSTOM.y > CUSTOM.w) { ACTIVE = false; 185 | } 186 | } 187 | 188 | ```

SHADER ERROR: Unknown identifier in expression: scale_texture at: (null) (:153)

If the shader is not compiling this could cause a crash, and this could be silent if some GPUs do not present the error log correctly. In fact we recently merged a change in 3.6 to correct for drivers which output error log incorrectly:

https://github.com/godotengine/godot/pull/84741

I don't know whether you were able to reject this as the cause of the crash. I also would try running without async shader compilation if shaders is possibly the problem.

LemonadeFlashbang commented 11 months ago

@lawnjelly : There are two modifications. One is a set of updates to labels in order to correct improper wrapping in CJK languages and with BBCodes. You can see these changes here. It is also compiled with Spine Runtimes present. Those changes are viewable in the Spine Godot github repo.

I don't believe the core issue is shaders but it's not impossible. There's a method at startup that loads all shaders in a dummy scenes then frees them to avoid the shader instancing lag that's otherwise present. The crash is present in builds that still have this method enabled. There's a couple whiteout shaders that aren't loaded at the start, but the fact that some players are able to progress past character select and only crash when the map is loaded implies to me that if it's a shader issue, it's not consistent.

Edit: The scons args are viewable in the build.sh script in Godot-Spine. It's the default platform=... with custom_modules=yes. There are no other changes going on in my build except for the ones just discussed.

I found an end user complaining of Godot related crashing with an AMD card on Reddit. In case that's helpful- it may not be the same issue since they have a different GPU.

@rsubtil : I'll try compiling a version with debug templates turned on and I'll comment again once I've heard back. Given the nature of the crash, I'm not expecting any new information.

Zireael07 commented 11 months ago

@LemonadeFlashbang Would be nice to see your shader stuff PR'ed to the core repository!

LemonadeFlashbang commented 11 months ago

@LemonadeFlashbang Would be nice to see your shader stuff PR'ed to the core repository!

It's not something extensible to other projects or an engine modification. It's a very clumsy "load all the combat particles, plus all the dialogue particles, plus all the ...." function that frees everything after an idle frame. It "solves" shader instancing lag by just moving it to startup.

It's a very hacky, brute-force solution but it works.

LemonadeFlashbang commented 10 months ago

Some updates- I believe the root cause is related to VRAM usage, but why it only causes problems on AMD machines remains a mystery to me. Using a debug build did not resolve the problem.

Here's a screenshot of the game's RAM usage on my local computer ryzen9

The game needs ~350 MB of RAM to function, however we can see that the committed memory, which includes VRAM, is substantially higher (~4 GB).

Now here's a screenshot of the game's RAM usage on another computer, with an AMD Ryzen 5, at startup: ryzen8

1-2 GB of RAM during the game's launch, and then a crash.

I decided to test the VRAM/Committed RAM hypothesis by running every single texture in the game through VRAM compression. This is not recommended for 2D games in the docs due to the artifacting, and has resulted in some pretty large impacts to the game's visual fidelity. The game's file size increased dramatically, however the game now only uses 2 GB of committed memory.

After that I had a user test. They were able to boot the game- but their task manager recorded the game taking up 2 GB of RAM. ryzen10

Here's the machine's specs: ryzen11

And here's the equivalent screenshot on my computer. ryzen12

Zireael07 commented 10 months ago

Similar memleak issues were reported with ANGLE backend - which backend is the problematic device using?

LemonadeFlashbang commented 10 months ago

I'm not sure I understand the question, forgive my ignorance. Google shows me ANGLE is a backend for web apps? This is a desktop application.

Calinou commented 10 months ago

I'm not sure I understand the question, forgive my ignorance. Google shows me ANGLE is a backend for web apps? This is a desktop application.

Godot has ANGLE support on Windows/macOS since 4.2, although it's only used on old AMD GPUs by default since 4.2.1 (older than GCN 4.0). This was done because OpenGL support in old AMD GPus is pretty bad on Windows, so running a Direct3D 11 translation layer gives better results in terms of reliability.

LemonadeFlashbang commented 10 months ago

I'm not sure I understand the question, forgive my ignorance. Google shows me ANGLE is a backend for web apps? This is a desktop application.

Godot has ANGLE support on Windows/macOS since 4.2, although it's only used on old AMD GPUs by default since 4.2.1 (older than GCN 4.0). This was done because OpenGL support in old AMD GPus is pretty bad on Windows, so running a Direct3D 11 translation layer gives better results in terms of reliability.

Sorry, I'm still a bit confused.

The game was developed in Godot 3.5, which seems like it's before ANGLE's support. What should I ask the users to best answer your question? Where would they find the information?

The GPU's vary. I have one reported instance with an NVIDIA GPU, and another with an AMD GPU. It's just the CPU that all the devices seem to share.

rsubtil commented 10 months ago

Can you check the Video RAM panel for more details on what objects are filling up the most memory? There might be a clue as to why the usage is so high.

I'm not familiarized with your game, but assuming it's essentially a 2D visual novel game, 4GB VRAM usage looks unreasonably high. That would also explain the high RAM usage on other setups, since AFAIK regular RAM is used when the VRAM becomes full (especially relevant on systems with integrated graphics where there is no VRAM) image

LemonadeFlashbang commented 10 months ago

Ah, I see. It doesn't explain the one device with an NVIDIA 3080 GPU - but that might be it for the others. The game loads everything into relevant DBs (item database, skill database, enemy database, etc.,) for access. I'm guessing that's probably loading all the images into VRAM, and these machines might be more budget systems.

There are some 20MB files which are models for the spine animations. Here's one example of such a mesh: ryzen13

And how it appears in game combat

Can't really cut that down without deforming the images.

There's a large number of 10MB files, which are files for the ending. Each ending has its own unique art- and there's over 100 of them. An example of one of those files:

ending

All together those probably account for ~1.5 GB of VRAM. 1 GB is probably in the ending art.

The remaining half is just in asset volume. Backgrounds, characters, items, skills, VFX textures, etc.,

I'm going to leave the ticket open until I get confirmation from the user with the 3080 that the issue is fixed, after which I'll close this. Until then, I'll start refactoring the database systems to see if there's a way to load some helper characteristics (things like IDs and if the item is in the random pool) without loading the attached image data.

LemonadeFlashbang commented 10 months ago

Compressing everything has allowed the game to work for one user's machine, but not another.

The user who cannot get the game to run has an NVIDIA 3080 GPU, 26 GB Memory, and 10 GB of VRAM. What DID work for this user was running a debug version of the game.

So it's possible there's two separate issues with similar impacts. Leaving this ticket open since we're seeing crashes that aren't purely memory related.