floooh / sokol

minimal cross-platform standalone C headers
https://floooh.github.io/sokol-html5
zlib License
6.76k stars 474 forks source link

Define uniforms with introspection. #23

Closed Edwearth closed 5 years ago

Edwearth commented 6 years ago

Why are uniforms in sg_shader_uniform_block_desc set manually ? Maybe the could be set automatically during the compilation of the shader with introspection ?

Thank you in advance.

floooh commented 6 years ago

This is necessary because uniform data is provided as single uniform block structure in the sokol_gfx API but set through granular 'per-glUniform calls' in the GL backend (mainly because GLES2/WebGL don't support uniform blocks). The GL backend needs to know the name, offset and data of each member inside the applied uniform block to select the right glUniformxxx() call.

Only the GL backend needs the per-uniform definitions, on Metal and D3D11 only the uniform block size is required, the internal structure doesn't matter (as long as it matches the uniform block layout in the shaders)

Also, higher up this data would usually be provided by a shader-code-generation system, which would extract the data during offline-shader-validation. For instance in Oryol I'm using glsangValidator to compile GLSLv330 to SPIRV, and then SPIRV-Cross to translate the SPIRV back to various GLSL version, Metal and D3D11 source code, and in addition extract reflection data about shader uniform blocks which is then used to generate C structures matching the shader uniform block layouts (this is described in detail here: http://floooh.github.io/2017/05/15/oryol-spirv.html)

JMLX42 commented 6 years ago

The GL backend needs to know the name, offset and data of each member inside the applied uniform block to select the right glUniformxxx() call.

But this data can be obtained automatically from OpenGL using glGetProgramiv and glGetActiveUniform (example here).

So why do we have to describe it manually?

floooh commented 6 years ago

Good point, but I'm still not quite convinced :) I'll play around with this, but I have some concerns, since I'm basically asking the shader what a user-side C struct looks like, I think that may be too brittle.

My concerns:

The way it is at the moment the application code basically provides a description of the elements in the C structure, and the GL backend loops over those, and can ignore any uniforms that have been optimized out by the shader (see here: https://github.com/floooh/sokol/blob/90b6c532887713e86e575023507ede565aa84efa/_sokol_gfx_gl.impl.h#L1852).

At the very least, glGetActiveUniform() would provide better validation though (something like "hey, you provide this uniform as a vec4, but in the shader it is actually a vec2).

floooh commented 6 years ago

PS: if my experiments are success I will probably still put this behind a define, something like "SOKOL_GL_INFER_UNIFORMBLOCK_LAYOUT", or optionally as a runtime flag in shader_desc(?)

JMLX42 commented 6 years ago

is it guaranteed that on all platforms and drivers that glGetActiveUniform returns the uniforms in the same order as they appear in the shader code?

What is the point of an offset if everything is typed and in order?

In the end, you'll still end up mapping your C struct with the uniforms by using multiple calls to glUniform anyway. So I don't understand why the order is important.

a shader uniform may be removed by the GLSL compiler

How would the developper do a better job with a manual layout in this case?

do I get enough information from glGetActiveUniform() to identify the type and number of elements of uniform arrays?

IDK. But the code I linked is used in production on many platforms (iOS, Android, WebGL...) and it deals with arrays.

Let me know how we can help :)

floooh commented 6 years ago

What is the point of an offset if everything is typed and in order?

let's say my shader uniforms look like this:

uniform vec4 bla;
uniform vec2 blub;
uniform vec3 foo;

...and my C struct looks like this:

typdef struct {
  hmm_vec4 bla;
  hmm_vec2 blub;
  hmm_vec3 foo;
} params;

The calls to glGetActiveUniform for indices 0,1 and 2 return the uniforms bla, blub and foo in that order with the right size, everything will be alright, it knows that the first member bla starts at 0, then adds the size of bla to find the start of blub, and then adds the size of blub to find foo.

If the order is different (e.g. reversed: foo for uniform index 0, blub for index 1, and bla for index 2), the uniform update code would assume that foo starts at offset 0 in the C struct, followed by blub at index 12, and bla at index 20).

How would the developper do a better job with a manual layout in this case?

The manual description only cares what the C structure looks like, if any GLSL uniforms have been dropped by the GLSL compiler, the uniform update code can detect this (because glGetAttribLocation returns -1 for that uniform block member), and can skip the member in the C struct.

floooh commented 6 years ago

PS: One element in the sg_shader_desc that's definitely redundant is the offset, I can remove this even without shader introspection, since the order and type already provides enough information to compute the offset.

For the other things I will try to do some experiments tonight.

JMLX42 commented 6 years ago

everything will be alright

AFAIK there is no guarantee that the OpenGL driver will return the uniforms in order. There is also no guarantee that they will be contiguous in memory. So I really don't understand how you map a C struct to OpenGL uniforms.

Could you please explain that?

floooh commented 6 years ago

ok pseudo-code time :)

The sg_apply_uniform_block() function essentially takes a pointer and a size of a "uniform block" C struct provided by the application code, and the content of this struct needs to be split into one or several glUniformXXX() calls.

For this I need to "loop over" the members in the C struct and call the right glUniformXXX():

for uniform in uniform_block {
  GLfloat* uniform_ptr = &uniform_block + uniform.offset;
  switch (uniform.type) {
    case SG_UNIFORMTYPE_FLOAT:
        glUniform1fv(uniform.gl_location, uniform.array_count, uniform_ptr);
        break;
    case SG_UNIFORMTYPE_FLOAT2:
       glUniform2fv(uniform.gl_location, uniform.array_count, uniform_ptr);
       break;
    case SG_UNIFORMTYPE_FLOAT3:
        ...
  }
}   

What I would really need to really make this "waterproof" is reflection information from the C/C++ compiler, not reflection information from the GLSL compiler (although the latter helps).

The best case would be to get reflection information from the C compiler about the members inside the uniform block struct, and reflection information from the GLSL shader, so that I could validate both against each other (does the C struct provide all the uniforms required by the shader, and do their types match).

Since there is no C/C++ reflection, I provide this information manually (or in higher level libs like Oryol, via "offline" shader code generation).

amerkoleci commented 6 years ago

Would be nice to have UBO support under OpenGL also, I'm working on sokol like library for my game engine (https://github.com/amerkoleci/alimer).

floooh commented 6 years ago

I tinkered with OpenGL uniform buffer support in Oryol a while ago but didn't see any performance advantage over traditional uniforms so I scrapped the code again because it was more complex and I had to use two code paths (for WebGL/GLES2 vs GL versions with UBO support).

What I'm doing in Oryol now (with the help of SPIRVCross) is to convert the uniforms from the input GLSL to a single vec4 array in the output GLSL (SPIRVCross has a feature for this). That way applying a uniform block via sokol is always a single call to glUniform4fv() instead of multiple calls to glUniformxxx().

I think the best uniform update strategy is implemented in the Metal backend right now, this uses one big uniform buffer for the entire frame, copies the uniform data in there and records the uniform buffer offset into the Metal command list. Unfortunately this strategy doesn't work in base-D3D11, or in OpenGL without implementing a render command list (I did just this for GL when I played around with GL UBOs in Oryol). I wrote a bit about this here (search for Uniform Buffer Support): http://floooh.github.io/2016/10/06/oryol-webgl2.html

amerkoleci commented 6 years ago

I see, on my engine I plan to add Vulkan support (with single CommandBuffer initially), and probably will investigate about UBO support, at the moment under Sokol is not possible to share uniform buffers between multiple shaders.

floooh commented 6 years ago

Yes the way sokol is currently designed basically expects that uniform data is copied somewhere when calling sg_apply_uniform_buffer(), there is no concept of uniform buffers for static data (I'm not convinced that static uniform buffers make much sense, at least for small updates like material parameters or transform matrices). For large data, vertex buffers or images would be better suited.

JMLX42 commented 6 years ago

What I would really need to really make this "waterproof" is reflection information from the C/C++ compiler, not reflection information from the GLSL compiler (although the latter helps).

Why do you need C reflection at all? Just use the GLSL reflection on an array of char and an array of void. Or have an array of union with a field for every GLSL type there aren't that many...

If I understand properly it means:

That sounds very inconvenient.

It kinda means you have a "hidden" dependency to some offline tool. But that very specific choice is a very heavy burden for a project which aims at being light and dependence free...

Kinda weird and contrasting with the rest of your API which is very well thought.

PS: I would love to see how you're using SPIR V or another tool to generate this data. Have you documented it anywhere? Thanks!

floooh commented 6 years ago

Ok, let's start from the API side:

The only important thing is that the application provides a single memory chunk for each call to sg_apply_uniform_block(), which must contain data for each valid uniform in a shader, everything else is debatable.

For GLES2/WebGL, the single call to sg_apply_uniform_block() must be split into multiple calls to glUniformxxx(), the sg_apply_uniform_block() must know the data type and offset of the data associated with a shader uniform in the application-provided memory chunk.

Assuming that the loop over glGetActiveUniforms() returns all the uniforms listed in the shader source in the right order (so the shader compiler doesn't remove unused uniforms or reorders them)... it might be possible to have a special 'reflection mode' only for GL (D3D11 and Metal only need to know about the uniform block sizes, not their content). I think SPIRVCross was messing up the uniform order, so such a special 'reflection mode' would need to be optional.

Btw, you don't need code generation to provide uniform blocks, since you don't need to have C structs, that's just a convenience to simplify filling the uniform data from the CPU side. You only need a pointer and a size to the uniform data, but the uniform data in that block needs to have the right layout (which is 'implicit' for D3D11 and Metal, but must currently be explicitly provided for GL since it splits up the data over several glUniform calls - this could probably provided by reflection from GL).

For the SPIRVCross usage:

JMLX42 commented 6 years ago

I think the main issue here is that I see this from a GLES perspective, where we have to make multiple glUniform* calls anyway. But the general case is more of a single uniform data structure. Correct?

so the shader compiler doesn't remove unused uniforms

Meaning D3D11 and Metal still expect unused uniforms to be defined?

For the SPIRVCross usage:

Thanks for all the interesting intel! Very helpful.

From you blog post, I understand GLSL is your only shader language, and you get HLSL/Metal through x-compilation. Correct?

floooh commented 6 years ago

I think the main issue here is that I see this from a GLES perspective, where we have to make multiple glUniform* calls anyway. But the general case is more of a single uniform data structure. Correct?

Yes, both in the D3D11 and Metal backend, uniform data update is a single operation (in Oryol with the shader code generator I'm "emulating" that for GL with a single uniform array, even with a sokol-gfx backend, so an sg_apply_uniform_block() also always is a single glUniformxxx call).

Meaning D3D11 and Metal still expect unused uniforms to be defined?

AFAIK yes, both in D3D11 and Metal the uniform data is defined as a struct, I think unused struct members are not removed by the shader compiler.

From you blog post, I understand GLSL is your only shader language, and you get HLSL/Metal through x-compilation. Correct?

Yes, shaders are written in GLSL v330, and then translated via glslangValidator + SPIRVCross to GLSL v100 (GLES2/WebGL), GLES v300es3 (GLES3/WebGL2), GLSL v330 (desktop GL), HLSL5 or MetalSL.

code-disaster commented 6 years ago

FYI I hacked in UB support in my fork here.

I'm also working on a glslang+spirv-cross toolchain which consumes GL45 shaders and spits out HLSL/MSL/GL33/GL45. I couldn't yet figure out how to get them compiled to both MSL (spirv-cross insists on using uniform blocks) and GL33 (sokol can't update uniforms this way). I figured it's just easier to just add UB support then.

floooh commented 6 years ago

When I played around with GL uniform blocks in Oryol I wasn't convinced that they really would be an advantage over the "update as single vec4 array", I was using one big uniform block per frame, the same way I'm doing it on Metal. This approach should theoretically be the fastest, but I had to record Oryol rendering commands in my own command queue since I could only start rendering once all uniform updates had been recorded into the uniform buffer. And I didn't see performance improvements (I think that traditional glUniform updates are very fast because GL drivers are essentially doing the same thing under the hood).

Also soo: http://floooh.github.io/2017/04/04/oryol-webgl2-merge.html

floooh commented 6 years ago

Btw, this is how I'm doing uniform updates in the Oryol when using sokol-gfx as backend, I'm assuming that the uniform block is a vec4 array:

https://github.com/floooh/oryol/blob/sokol-gfx/code/Modules/Gfx/private/sokol/sokolGfxBackend.cc#L753

And this is how to tell SPIRVCross to 'flatten' each uniform block into a single vec4 array:

https://github.com/floooh/oryol-tools/blob/e15ae8406bce381c9b848485731b1968a8df4bdc/src/oryol-shdc/main.cc#L263-L270

This is only called when producing GLSL:

https://github.com/floooh/oryol-tools/blob/e15ae8406bce381c9b848485731b1968a8df4bdc/src/oryol-shdc/main.cc#L283-L301

code-disaster commented 6 years ago

Yes, I've seen the 'flatten' part, and briefly tried it, but it didn't seem to work. Not sure if it's me doing something wrong, or using more recent versions of glslang and spirv-cross.

By the way, how do you produce the .spv's? I don't see that code anywhere.

As mentioned, the reason I'm currently going with GL UB is that I couldn't stop spirv-msl insisting in uniform blocks. Instead of doing something shady in the shader pipeline, it feels simpler to just solve it this way.

floooh commented 6 years ago

Hmm maybe I should finally update SPIRVCross, I also have a couple of hacks in my fork which should be fixed in the main version (something about matrix row/colum-major order for HLSL).

I'm generating the .spv by separately calling glslangValidator, controlled by a fips code generation script: https://github.com/floooh/oryol/blob/sokol-gfx/fips-files/generators/Shader.py, and also see here: https://github.com/floooh/oryol/tree/sokol-gfx/fips-files/generators/util

Not sure if you're aware, but @pjako has ported the Oryol shader cross-compiler to sokol (although not everything is supported yet AFAIK): https://github.com/pjako/shd

Eventually I want to provide a more generic solution as well, which doesn't depend so much on python plumbing (ideally it would be a self-contained exe which takes a GLSL source file and emits different GLSL versions, HLSL and MetalSL, along with a JSON reflection-info file... and maybe optionally even a full sokol-gfx compatible C header.

pjako commented 6 years ago

Shd is not sokol specific (anymore), it just compiles the shaders and provide you the shader reflections. But its really easy from there to build the sokol shader description structs in your code.

The generated code is currently broken though, I will fix it in the next days.

code-disaster commented 6 years ago

ideally it would be a self-contained exe which takes a GLSL source file and emits different GLSL versions, HLSL and MetalSL, along with a JSON reflection-info file...

Yes, this is what I currently do. Minus the reflection info - didn't do that yet. It's a small(*) C executable static linked with glslang and spirv-cross, and optionally invoking fxc and metallib to precompile binaries.

Some of this data processing stuff is a little cumbersome to do in C/C++, but I enjoy it more than writing Python scripts.

(*) 4.4 MB executable in release builds, thanks to these libraries

_addendum: tried again, got the 'flatten block' option - and the associated sg_shaderdesc - working now

floooh commented 5 years ago

Closing this to unclutter the ticket list. I think with the various shader cross-compilers there are now various options for getting shader reflection data (although in sokol-shdc this would need a new output option).