Gl es 300 layout std140

solan-solan commented 11 months ago

axmol version: dev
devices test on: windows
developing environments
- NDK version: r23c
- Xcode version: 14.2
- Visual Studio:
  - VS version: 2019 (16.11), 2022 (17.6+)
  - MSVC version: 1929, 1934, 19.35, 19.36, 19.37
  - Windows SDK version: 10.0.22621.0
- cmake version: Steps to Reproduce:

I started to port shaders in my project from gles 1 to gles 3 and stuck with the similar issue https://stackoverflow.com/questions/15616558/regarding-arrays-in-layout-std140-uniform-block-for-opengl For example stride between elements inside float array will equal sizeof(vec4) if this array declared with std140. Sorry, I can't check testcpp right now, but there are some arrays like _pointLightUniformRangeInverseValues which simply pass to the uniform block. Can someone confirm that they are handled properly?

Ps. I chose to do all arrays with vec4 base type if they intended for shader to minimize calculation on the core side

aismann commented 11 months ago

Steps to Reproduce:
@solan-solan Maybe some more info will help to reproduce this issue.

solan-solan commented 11 months ago

@aismann There is Terrain class with the following code to update _udetailSize :

And the shader code to process this uniform (_terrainfs):

If this issue exists then you should observe the following thing after uniform updated on the core side:

shader...............................|........................cpu

u_detailSize[0]............... |.........................detailMapSize[0] 0...........................................|.........................detailMapSize[1] 0...........................................|.........................detailMapSize[2] 0...........................................|.........................detailMapSize[3] u_detailSize[1] 0 0 0 u_detailSize[2] 0 0 0 u_detailSize[3]

It means that v_texCoord should always equal vec(0,0) for _u_tex1, u_tex2, utex3 Check please cpptest, if you could see just one texture on the terrain and three simple spots ps I will do it a little later

solan-solan commented 11 months ago

I built cpp_test and can confirm the issue (for some reason there is no Content folder in the cpp_test in my rebaised repo).

Now, the picture in the first terrain test is:

If code will be changed to this:

The picture becomes:

halx99 commented 11 months ago

The detailMapSize is float[4] not vec3 confuse me

halx99 commented 11 months ago

The detailMapSize is float[4] not vec3 confuse me

I known, array base element size is vec4, so float[4] layout equals to vec4[4]

And edit terrain.frag, change float u_detailSize[4] to vec4 u_detailSize also solve this issue.

NOTE: float u_detailSize[4] is bad to GLES3, so should be documented at 2.1 release notes, developer should avoid use it, because it will cost much GPU memory.

halx99 commented 11 months ago

A possible compatible solution in base.glsl:

#if !defined(GLES2)
#define vfloat_def(var, count) vec4 var[(count + 3) / 4]
#define vfloat_at(x,y) x[y/4][y%4]
#else
#define vfloat_def(var,count) float var[count]
#define vfloat_at(x,y) x[y]
#endif

usage:

#version 310 es
precision highp float;
precision highp int;

#include "base.glsl"

layout(location = TEXCOORD0) in vec2 v_texCoord;
layout(location = NORMAL) in vec3 v_normal;
layout(binding = 0) uniform sampler2D u_alphaMap;
layout(binding = 1) uniform sampler2D u_tex0;
layout(binding = 2) uniform sampler2D u_tex1;
layout(binding = 3) uniform sampler2D u_tex2;
layout(binding = 4) uniform sampler2D u_tex3;
layout(binding = 5) uniform sampler2D u_lightMap;
layout(std140) uniform fs_ub {
    int u_has_alpha;
    int u_has_light_map;
    vfloat_def(u_detailSize, 4);
    vec3 u_lightDir;
};

layout(location = SV_Target0) out vec4 FragColor;

void main()
{
    vec4 lightColor;
    if(u_has_light_map<=0)
    {
        lightColor = vec4(1.0,1.0,1.0,1.0);
    }
    else
    {
        lightColor = texture(u_lightMap,v_texCoord);
    }
    float lightFactor = dot(-u_lightDir,v_normal);
    if(u_has_alpha<=0)
    {
        FragColor = texture(u_tex0, v_texCoord)*lightColor*lightFactor;
    }
    else
    {
        vec4 blendFactor =texture(u_alphaMap,v_texCoord);
        vec4 color = vec4(0.0,0.0,0.0,0.0);
        color = texture(u_tex0, v_texCoord*vfloat_at(u_detailSize, 0))*blendFactor.r +
        texture(u_tex1, v_texCoord*vfloat_at(u_detailSize, 1))*blendFactor.g + texture(u_tex2, v_texCoord*vfloat_at(u_detailSize, 2))*blendFactor.b
            + texture(u_tex3, v_texCoord*vfloat_at(u_detailSize, 3))*(1.0 - blendFactor.a);
        FragColor = vec4(color.rgb*lightColor.rgb*lightFactor, 1.0);
    }
}

solan-solan commented 11 months ago

It is good idea about macroses, but definition should look something like this to my understand:

t_def(var, count) vec4 var[count/4 + 1 ]

And the following for vec3 which also should be taking in account:

t_def(var, count) vec4 var[(count*3)/4 + 1 ]
vvec3_at(x,y) x[(y*3)/4][(y*3)%4]

halx99 commented 11 months ago

It is good idea about macroses, but definition should look something like this to my understand:

t_def(var, count) vec4 var[count/4 + 1 ]

should be #define vfloat_def(var, count) vec4 var[(count + 3) / 4]:

#define vfloat_at(x,y) x[y/4][y%4] is correct:

solan-solan commented 11 months ago

vvec3_at(x,y,z) x[(y*3)/4][((y*3)%4) + z] It is needed to pass vec3 index explicitly as z to allow byte manner access

This approach unfortunately should not work since z will step from one vec4 to another one time(

May be such, but I could check it now vvec3_at(x,y,z) x[(y*3+z)/4][((y*3+z)%4)]

halx99 commented 11 months ago

vvec2 also require?

solan-solan commented 11 months ago

Yes, each array has alignment on its base element size and stride equal to vec4 size

halx99 commented 11 months ago

And, apple metal not working both original code or use vxxx_def, no idea yet, EDIT: confirmed, bug of glslcc: #1520

halx99 commented 11 months ago

#if !defined(GLES2)
#  define vfloat_def(x, y) vec4 x[(y + 3) / 4]
#  define vfloat_at(x, y) x[y / 4][y % 4]

#  define vvec2_def(x, y) vec4 x[(y * 2 + 3) / 4]
#  define vvec2_at(x, y) vec2(x[(y / 2)][y % 2], x[(y / 2)][y % 2 + 1])

#  define vvec3_def(x, y) vec4 x[(y * 3 + 3) / 4]
#  define vvec3_at(x, y) vec3(x[(y / 3)][y % 3], x[(y / 3)][y % 3 + 1], x[(y / 3)][y % 3 + 2])
#else
#  define vfloat_def(x, y) float x[y]
#  define vfloat_at(x, y) x[y]

#  define vvec2_def(x, y) vec2 x[y]
#  define vvec2_at(x, y) x[y]

#  define vvec3_def(x, y) vec3 x[y]
#  define vvec3_at(x, y) x[y]
#endif

solan-solan commented 11 months ago

@halx99 Looks good, but how vvec3_at supposed to work? What if I want to get vec3[1].x which should be at the vec4[0].w?

vec3(x[(1 / 3)][1 % 3], ...) == vec3(x[0][1], ...) == vec3(vec4[0].y, ... )

halx99 commented 11 months ago

x[(y3)/4][((y3)%4) + z]

I checked, vvec3_at both x[(y * 3) / 4][((y * 3) % 4) + z], x[(y * 3 + z) / 4][((y * 3 + z) % 4)] correct, test code

typedef float vec2[2];
typedef float vec3[3];
typedef float vec4[4];

#if !defined(GLES2)
#  define vfloat_def(x, y) vec4 x[(y + 3) / 4]
#  define vfloat_at(x, y) x[y / 4][y % 4]

#  define vvec2_def(x, y) vec4 x[(y * 2 + 3) / 4]
#  define vvec2_at(x, y, z) x[(y / 2)][y % 2 * 2 + z]

#  define vvec3_def(x, y) vec4 x[(y * 3 + 3) / 4]
#  define vvec3_at(x, y, z) x[(y * 3) / 4][((y * 3) % 4) + z]
#  define vvec3_at2(x, y, z) x[(y * 3 + z) / 4][((y * 3 + z) % 4)]
#else
#  define vfloat_def(x, y) float x[y]
#  define vfloat_at(x, y) x[y]

#  define vvec2_def(x, y) vec2 x[y]
#  define vvec2_at(x, y, z) x[y][z]

#  define vvec3_def(x, y) vec3 x[y]
#  define vvec3_at(x, y, z) x[y][z]
#endif

int main()
{
    vvec2_def(vec2_points, 20);
    for (int i = 0; i < 20; ++i) {
        vvec2_at(vec2_points, i, 0) = 100 * i + 1; // vec2.x
        vvec2_at(vec2_points, i, 1) = 100 * i + 2; // vec2.y
    }

    vvec3_def(vec3_points, 20);
    for (int i = 0; i < 20; ++i) {
        vvec3_at(vec3_points, i, 0) = 100 * i + 1; // vec3.x
        vvec3_at(vec3_points, i, 1) = 100 * i + 2; // vec3.y
        vvec3_at(vec3_points, i, 2) = 100 * i + 3; // vec3.z
    }
    vvec3_def(vec3_points2, 20);
    for (int i = 0; i < 20; ++i) {
        vvec3_at2(vec3_points2, i, 0) = 100 * i + 1; // vec3.x
        vvec3_at2(vec3_points2, i, 1) = 100 * i + 2; // vec3.y
        vvec3_at2(vec3_points2, i, 2) = 100 * i + 3; // vec3.z
    }

    printf("vec3 points: \n");
    for (auto& v : vec3_points) {
        printf("%g,", v[0]);
        printf("%g,", v[1]);
        printf("%g,", v[2]);
        printf("%g,", v[3]);
    }
    printf("\n");

    bool verifiy_success = memcmp(vec3_points, vec3_points2, sizeof(vec3_points)) == 0;
    std::cout << "verify vvec3_at vvec3_at2 result: " << verifiy_success << "\n";
}

solan-solan commented 11 months ago

I do not know how vvec3_at would work on the real device, since: vec3[1].y == x[(1 3) / 4][((1 3) % 4) + 1] == x[0][4] which overrides vec4. vvec3_at2 gives x[1][0] which looks more safe

halx99 commented 11 months ago

I do not know how vvec3_at would work on the real device, since: vec3[1].y == x[(1 3) / 4][((1 3) % 4) + 1] == x[0][4] which overrides vec4. vvec3_at2 gives x[1][0] which looks more safe

refer finally edition: https://github.com/simdsoft/axmol/tree/fix-gles3-shader-layout

halx99 commented 11 months ago

@solan-solan Please help review #1523

axmolengine / axmol

Gl es 300 layout std140 #1510