Add ability to pass compute buffers to vertex/fragment shaders

jtsorlinis commented 1 year ago

Describe the project you are working on

Procedural mesh generation in compute shaders.

Describe the problem or limitation you are having in your project

The buffers containing the mesh data have to be read back and processed on the CPU into vertex buffers.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

I think it would be really helpful to allow storage buffers to be bound to and also accessed in vertex/fragment shaders.

This would greatly expand the potential uses for compute shaders as the current implementation requires CPU readback which can be a significant performance bottleneck.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

A storage buffer can be bound to a vertex/fragment shader with a function similar to set_shader_parameter eg

var buffer := rd.storage_buffer_create(input_bytes.size(), input_bytes)
myShaderMaterial.set_shader_parameter("myBuffer", buffer) # something like this

Which could then be used in said shader to allow procedural drawing eg.

shader_type spatial;

layout(set = 0, binding = 0, std430) restrict buffer bufferType {
    vec4 positions[];
} myBuffer;

void vertex() {
  VERTEX += myBuffer.positions[VERTEX_ID];
}

void fragment() {
  COLOR = vec4(0.4, 0.6, 0.9, 1.0);
}

If this enhancement will not be used often, can it be worked around with a few lines of script?

I don't believe it's possible to implement this as a plugin as its part of the core rendering pipeline.

Is there a reason why this should be core and not an add-on in the asset library?

As above, I don't think it's possible.

Ali32bit commented 1 year ago

this should absolutely be a thing. its such a huge bottleneck and de-service to what godot can do in the right hands if we dont have "render textures" and "compute buffers" that we can pass to shaders.

this would allow for : real time interactive shaders such as dynamic wind and dynamic water . real time mirrors or planar reflections , much more efficient interactive TV screens or view ports or anything else that would require rendering part of the game and making it a texture somewhere else. currently implementing any of this stuff is PAINFULLY slow. adding just 3 viewports for passing compute shaders can kill the framrate and is super unreliable as it depands on setting up the node paths every time you want to load assets that use such shaders. which tend to break completely when you load the scene somewhere else.

NomiChirps commented 1 year ago

+1! I'm working on a project that requires simulating an arbitrarily deformable/cuttable/etc object with high precision. I'm currently using a 512x512x512 3D texture to store an SDF representing the object's surface. Writing a fragment shader to render it was "no problem" (ha ha), but making changes to it from the CPU is extremely slow. I'd love it if Godot could make it easy to use a compute shader to directly modify such a buffer on the GPU.

Daedalus1400 commented 1 year ago

Let's extend this to it's logical conclusion: All shaders should be able to access compute buffers, and compute shaders should be able to access render buffers.

I'm currently working on a compute shader based particle simulation, and the frame rate is terrible for large simulations despite neither my CPU nor GPU being taxed. It's bottle necked by writing the position data to a MultiMeshInstance3D from the CPU. If I could access the position buffer from inside a particle shader, the particles would position themselves with no CPU overhead.

The complete memory isolation of gdshader scripts is a huge limitation.

wojtekpil commented 1 year ago

I totally agree on importance of this proposal. Without it the usages of compute shaders are very limited. With possibility of editing buffers and textures from compute shaders we create opportunity for custom GPU based culling systems, custom LOD systems (even for multimeshes - imagine that you could use LOD per instance). Add to it possibility of reading render buffers from CS and even post processing effects that are hard to achieve otherwise could be easily added and chained (e.g. gaussian blur for some effects needs 2-3 viewports, that could be chained in 2 simple compute shaders). It could also help to avoid issues with transparency of full screen post-processing planes in front of camera. Also it would probably make life easier for creating things like GPU painting applications or terrain editors, as we are not limited by one 8-bit viewport texture.

Facundo15 commented 1 year ago

It must definitely be something integrated into Godot, I am working on a projectile simulation system in which I am calculating the movement, and a very primitive physics (rectangles and buffer of all rectangular shapes) to detect collisions and send them to the CPU. But I use Particles to visualize the behavior and having to send everything to the CPU for then the particle shader became a very slow process.

Facundo15 commented 1 year ago

@jtsorlinis I have a better way that can be implemented at the code level when you want to make that association.

Basically it is that the buffers are "shared" so that the Compute Shader can directly write uniforms as it wants within the godot shader and it can be something like

var buffer := rd.storage_buffer_create(input_bytes.size(), input_bytes)
material_shader.set_shared_uniform("unform_data", buffer_rid);

shader_type canvas_item;

shared uniform vec2 uniform_data;

This might be a simpler way to tell the shader code that the buffer will be shared between the compute shader and the godot shader (a simpler form of adding fo following Godot's simple philosophy).

I'm not an expert in glsl and shaders, so I don't know and the shared keyword could be used for this case.

oxi-dev0 commented 10 months ago

I also agree with this proposal. I may be understanding this wrong, but in general I think it would be very beneficial for the engine to support sharing resources in a gpu -> gpu manner rather than requiring copying to the cpu. E.G in my blood surface system, I render to a framebuffer in order to generate a mask to use in a material. However, in the current 4.1.2 build there is no way to create the framebuffer from an ImageTexture that the material can sample from. I have to use a custom build of the engine (based on this fork https://github.com/huisedenanhai/godot/commit/32a05a59a0b0fce63eab6fe0e818b367acc11ac8 ) that allows me to get the render device id of the ImageTexture. The current "supported" method would be to create a texture with the render device and use that for the framebuffer, then at some point sync and copy the data from that texture onto the cpu, then copy it into the ImageTexture for the material to use, which is very slow and inefficient.

kpietraszko commented 10 months ago

@Facundo15 Keep in mind that uniforms have a much lower size limit compared to storage buffers. So for many cases they're not a solution.

BenMcLean commented 7 months ago

Because of this issue, I invested a good deal of time working out how to treat a texture uniform as if it was a storage buffer to "smuggle" my raw byte data onto the GPU as a workaround. https://gist.github.com/BenMcLean/9327690b93690b8a92a921df003f7954 My eventual goal would be to do rendering from a sparse voxel octree in a fragment shader and a storage buffer would be the ideal way to send the octree data from the CPU to the GPU for that, but in the meantime, I'm going to be trying to fit my octree data inside a texture uniform.

critopadolf commented 7 months ago

Because of this issue, I invested a good deal of time working out how to treat a texture uniform as if it was a storage buffer to "smuggle" my raw byte data onto the GPU as a workaround.

https://gist.github.com/BenMcLean/9327690b93690b8a92a921df003f7954

My eventual goal would be to do rendering from a sparse voxel octree in a fragment shader and a storage buffer would be the ideal way to send the octree data from the CPU to the GPU for that, but in the meantime, I'm going to be trying to fit my octree data inside a texture uniform.

Haha nice to see someone else trying the same thing. I used a texture2drdarray where my struct fits onto a 2x2xN float texture. It sounds like yours is more generalized though I haven't looked over it.

TokisanGames commented 7 months ago

@BenMcLean You can do this more directly, reading and writing 32-bit uints on the CPU and in the shader. 32-bit floats are even easier. There's no need to use an rgba8 format and having to convert numbers and risk precision loss.

For a working example of transferring non-image data to the shader in production, we transfer a texture array full of up to 1GB (16k^2 32-bit) of bit packed uint data. You can look through the code for Terrain3D, our bit packed control map format, en/decoders, CPU writer, shader and here.

Regarding this ticket compute -> vertex/fragment without copying to the CPU, doesn't Bastian's water compute demo already do this using a Texture2DRD? He says Instead of copying data from texture to texture to create this history, we simply cycle the RIDs. and in the code I don't see any functions that pull the Texture from compute to an Image. It just gets the texture RID from the compute shader and assigns it to the shader material uniform. @BastiaanOlij?

kb173 commented 7 months ago

Regarding this ticket compute -> vertex/fragment without copying to the CPU, doesn't Bastian's water compute demo already do this using a Texture2DRD? He says Instead of copying data from texture to texture to create this history, we simply cycle the RIDs. and in the code I don't see any functions that pull the Texture from compute to an Image. It just gets the texture RID from the compute shader and assigns it to the shader material uniform.

This issue (as I understand it) is about passing vertex data, i.e. meshes, directly from compute to vertex shader to facilitate procedural mesh generation, right? So workarounds with textures don't work because we still can't create new vertices without involving the CPU.

clayjohn commented 7 months ago

Just to add some context here. We already have plans to expose a way to create/update meshes using compute shaders without copying the data to the CPU and back https://github.com/godotengine/godot-proposals/issues/7209

Using individual SSBO's for each channel will be a bit too cumbersome I think and will be pretty limiting as it wouldn't allow you to create meshes totally on the GPU and then use them in rendering.

godotengine / godot-proposals