beyond-all-reason / spring

A powerful free cross-platform RTS game engine
https://beyond-all-reason.github.io/spring/
Other
220 stars 102 forks source link

Investigate why map edge extension looks so weird on the latest versions #489

Closed lhog closed 1 week ago

lhog commented 2 years ago

Now image

vs

Before image

Beherith commented 2 years ago

This map, crystalized plains, has strange fog values:

        fogEnd = 1.2,
        fogStart = 1,
        fogColor = {
          0.07,
          0.05922,
          0.0504,
        },
Beherith commented 1 year ago

When I straight up just set:

fragColor.rgba = vec4(0.5, 0.5, 0.5, 0.5);

In the fragment shader end, then it seems like the skybox isnt drawn behind map edge extension:

image

lhog commented 1 year ago

Yes function widget:DrawWorldPreUnit() is situated too early to draw the extension. sky is drawn after terrain and unit/feature draw passes nowadays. Respectively we should move the drawing to function widget:DrawWorld() unless this will break something else...

lhog commented 1 year ago

I also noticed fancy selection shapes are not rendered in case there's no solid ground beneath them. I assume it's also drawn pre-unit. Can it be moved to draw in DrawWorld()?

Beherith commented 1 year ago

There was a reason selection shapes are drawn before units but I forgot what it was

Beherith commented 1 year ago

Ok, I tried tracking this down, and can summarize my results the following way: A lot of stuff including selection platters, team platters, ground AO plates, decals, grass, LAVA, all sensor ranges, etc, are drawn in function widget:DrawWorldPreUnit() The reasons are as follows:

Drawing map borders in function widget:DrawWorld() breaks all of these by drawing over them no matter what. Unless sky is that expensive, I think it would be best to keep drawing it right after map, before DrawWorldPreUnit.

Beherith commented 2 weeks ago

I'm resurrecting this issue, because I believe that due to the reasons above, and terrain being the biggest occluder of sky, that sky should be rendered earlier, after terrain, but before DrawWorldPreUnit

lhog commented 2 weeks ago

I'm resurrecting this issue, because I believe that due to the reasons above, and terrain being the biggest occluder of sky, that sky should be rendered earlier, after terrain, but before DrawWorldPreUnit

I think we can do a z-prepass for terrain and then restore the original order of sky rendering, while still discarding most of it as the sky is quite expensive on the fragment shader side.

Beherith commented 1 week ago

Im stupid, but on the right track!

Beherith commented 1 week ago

Im triggered by: sky is quite expensive on the fragment shader side. Thus im going to use pixel quad messaging to compute 4 octaves of FBM noise per pixel quad instead of per fragment.

Beherith commented 1 week ago

Pre-opt , UI off, looking at sky only 500 fps SKY OFF (2.0 ms) 250 fps vanilla (4.0 ms) (dt 2.0 ms) 360 fps PQM (2.77 ms), (dt 0.77 ms) Can you tell the difference in looks? I cant. 3x better perf by calculating 4 octaves of FBM noise per pixel quad instead of fragment


#version 130

in vec3 dir;

uniform float time;

uniform vec4 cloudInfo;
uniform vec3 skyColor;
uniform vec3 fogColor;
uniform vec4 planeColor; // .w signals if enabled
uniform vec3 sunDir;

const float cirrus1  = 0.9;
const float cumulus1 = 1.8;

out vec4 fragColor;

//  https://github.com/BrianSharpe/Wombat/blob/master/Value3D.glsl
float Value3D( vec3 P )
{
    // establish our grid cell and unit position
    vec3 Pi = floor(P);
    vec3 Pf = P - Pi;
    vec3 Pf_min1 = Pf - 1.0;

    // clamp the domain
    Pi.xyz = Pi.xyz - floor(Pi.xyz * ( 1.0 / 69.0 )) * 69.0;
    vec3 Pi_inc1 = step( Pi, vec3( 69.0 - 1.5 ) ) * ( Pi + 1.0 );

    // calculate the hash
    vec4 Pt = vec4( Pi.xy, Pi_inc1.xy ) + vec2( 50.0, 161.0 ).xyxy;
    Pt *= Pt;
    Pt = Pt.xzxz * Pt.yyww;
    vec2 hash_mod = vec2( 1.0 / ( 635.298681 + vec2( Pi.z, Pi_inc1.z ) * 48.500388 ) );
    vec4 hash_lowz = fract( Pt * hash_mod.xxxx );
    vec4 hash_highz = fract( Pt * hash_mod.yyyy );

    //  blend the results and return
    vec3 blend = Pf * Pf * Pf * (Pf * (Pf * 6.0 - 15.0) + 10.0);
    vec4 res0 = mix( hash_lowz, hash_highz, blend.z );
    vec4 blend2 = vec4( blend.xy, vec2( 1.0 - blend.xy ) );
    return dot( res0, blend2.zxzx * blend2.wwyy );
}

// Pre-opt , UI off, looking at sky only
// 500 fps SKY OFF (2.0 ms)
// 250 fps vanilla (4.0 ms) (dt 2.0 ms)
// 360 fps PQM (2.77 ms), (dt 0.77 ms)
// goal: use PQM for way better perf at 4 octave FBM
#define PQM 1

#if (PQM == 1)

    // QUAD MESSAGE PASSING LIBRARY
    // https://github.com/libretro/common-shaders/blob/master/include/quad-pixel-communication.h 
    vec4 get_quad_vector_naive(vec4 output_pixel_num_wrt_uvxy)
    {
        //  Requires:   Two measures of the current fragment's output pixel number
        //              in the range ([0, IN.output_size.x), [0, IN.output_size.y)):
        //              1.) output_pixel_num_wrt_uvxy.xy increase with uv coords.
        //              2.) output_pixel_num_wrt_uvxy.zw increase with screen xy.
        //  Returns:    Two measures of the fragment's position in its 2x2 quad:
        //              1.) The .xy components are its 2x2 placement with respect to
        //                  uv direction (the origin (0, 0) is at the top-left):
        //                  top-left     = (-1.0, -1.0) top-right    = ( 1.0, -1.0)
        //                  bottom-left  = (-1.0,  1.0) bottom-right = ( 1.0,  1.0)
        //                  You need this to arrange/weight shared texture samples.
        //              2.) The .zw components are its 2x2 placement with respect to
        //                  screen xy direction (IN.position); the origin varies.
        //                  quad_gather needs this measure to work correctly.
        //              Note: quad_vector.zw = quad_vector.xy * float2(
        //                      ddx(output_pixel_num_wrt_uvxy.x),
        //                      ddy(output_pixel_num_wrt_uvxy.y));
        //  Caveats:    This function assumes the GPU driver always starts 2x2 pixel
        //              quads at even pixel numbers.  This assumption can be wrong
        //              for odd output resolutions (nondeterministically so).
        vec4 pixel_odd = fract(output_pixel_num_wrt_uvxy * 0.5) * 2.0;
        vec4 quad_vector = pixel_odd * 2.0 - vec4(1.0);
        return quad_vector;
    }

    vec4 get_quad_vector(vec4 output_pixel_num_wrt_uvxy)
    {
        //  Requires:   Same as get_quad_vector_naive() (see that first).
        //  Returns:    Same as get_quad_vector_naive() (see that first), but it's
        //              correct even if the 2x2 pixel quad starts at an odd pixel,
        //              which can occur at odd resolutions.
        vec4 quad_vector_guess =
            get_quad_vector_naive(output_pixel_num_wrt_uvxy);
        //  If quad_vector_guess.zw doesn't increase with screen xy, we know
        //  the 2x2 pixel quad starts at an odd pixel:
        vec2 odd_start_mirror = 0.5 * vec2(dFdx(quad_vector_guess.z),
                                                    dFdy(quad_vector_guess.w));
        return quad_vector_guess * odd_start_mirror.xyxy;
    }

    // 4 octave FBM in a single quad, return averaged results
    vec4 quadFBM(vec3 Pos, vec4 frequencies, vec4 amplitudes, vec2 screenUV){
        // identify each fragment
        vec4 quad_vector = get_quad_vector(vec4(screenUV, floor(screenUV)));

        vec4 threadMask = vec4(0);
        // The thread mask is a vec4 that has a single 1 in it at different places for each thread
        threadMask = step(vec4(quad_vector.zw,0,0),vec4( 0,0,quad_vector.zw));
        threadMask = threadMask.spsp * threadMask.ttqq;

        // rotate each position with a matrix based on the thread mask
        // non free, doesnt improve much
        #if 0
            // we need to build some sort of matrix selector based on thread mask. 
            // The thread mask is a vec4 of one 1 and three 0's
            mat3 rot1 = mat3( -0.1028762, -0.1149348, -0.9880316,   0.9056579,  0.3999406, -0.1408231,    0.4113394, -0.9093060,  0.0629473 );
            mat3 rot2 = mat3(  0.9056579,  0.3999406, -0.1408231,    0.4113394, -0.9093060,  0.0629473,   -0.1028762, -0.1149348, -0.9880316 );
            mat3 rot3 = mat3(   0.8437217,  0.5364888,  0.0177019, -0.0774229,  0.1542616, -0.9849919 ,  -0.5311679,  0.8296885,  0.1716904 );
            mat3 rot4 = mat3( 0.2584462,  0.5781857,  0.7738907,    0.5668851, -0.7394425,  0.3631338,   0.7822064,  0.3448565, -0.5188709  );

            // chose a rotation matrix
            mat3 rotselect = rot1 * threadMask.x + rot2 * threadMask.y + rot3 * threadMask.z + rot4 * threadMask.w;

            // apply the rotation to the octave
            Pos = rotselect * Pos;
        #endif

        // establish a common position as the average?
        //  Not always needed... and expensive too!
        #if 0
            vec3 adjx = Pos - dFdx(Pos) * quad_vector.z;
            vec3 adjy = Pos - dFdy(Pos) * quad_vector.w;
            vec3 diag = adjx - dFdy(adjx) * quad_vector.w;
            Pos = Pos + adjx + adjy + diag;
            Pos = Pos * 0.25;
        #endif

        float pqm_noise = Value3D(Pos * dot(threadMask, frequencies)) * dot(threadMask, amplitudes);
        //gather each octave from each neighbour
        float nadjx = pqm_noise - dFdx(pqm_noise) * quad_vector.z;
        float nadjy = pqm_noise - dFdy(pqm_noise) * quad_vector.w;
        float ndiag = nadjx - dFdy(nadjx) * quad_vector.w;

        vec4 octaves = vec4(pqm_noise , nadjx, nadjy, ndiag);
        return octaves;
    }

#endif

#define noise(x) (Value3D(x))

const mat3 m = mat3(0.0, 1.60,  1.20, -1.6, 0.72, -0.96, -1.2, -0.96, 1.28);

#if (PQM == 1)
    float fbm(vec3 p)
    {
        float f = 0.0; 
        vec4 frequencies = vec4(1.0, 3.1, 7.2, 22.0);
        vec4 amplitudes = vec4(1.2,0.5,0.3,0.1);
        vec4 result =  quadFBM(p , frequencies, amplitudes, gl_FragCoord.xy);
        f = dot(result, vec4(0.5));
        return f;
    }
#else
    #if SIMPLIFIED_RENDERING == 0
    float fbm(vec3 p)
    {
        float f = 0.0; 

        f += noise(p) / 2 ; p = m * p * 1.1;
        f += noise(p) / 4 ; p = m * p * 1.2;
        f += noise(p) / 6 ; p = m * p * 1.3;
        f += noise(p) / 12; p = m * p * 1.4;
        f += (noise(p) / 24); // one less octave to make it perf comparable to old

        return f;
    }
    #else
    float fbm(vec3 p)
    {
        float f = 0.0;
        f += noise(p) / 2 ; p = m * p * 1.1;
        f += noise(p) / 4 ; p = m * p * 2.2;
        f += noise(p) / 12 ;
        f *= 1.25;
        return f;
    }
    #endif
#endif
float csstep(float m0, float m1, float n0, float n1, float v)
{
    return smoothstep(m0, m1, v) * (1.0 - smoothstep(n0, n1, v));
}

void main()
{
    vec3 pos = normalize(dir);

    float cirrus  = cloudInfo.w * cirrus1;
    float cumulus = cloudInfo.w * cumulus1;

    const vec3 sunColor = vec3(0.992, 0.985, 0.827);
    float sunContrib = pow(max(0.0, dot(pos, normalize(sunDir))), 64.0);

    float wpContrib = (1.0 - smoothstep(-0.5, -0.2, pos.y)) * planeColor.w;
    fragColor.rgb = mix(skyColor, planeColor.rgb, wpContrib);
    fragColor.rgb = mix(fragColor.rgb, sunColor * 1.3, sunContrib);

    vec3 day_extinction = vec3(1.0);
    vec3 night_extinction = vec3(1.0 - exp(sunDir.y)) * 0.2;
    vec3 extinction = mix(day_extinction, night_extinction, -sunDir.y * 0.2 + 0.5);

    // Cirrus Clouds
    float density = smoothstep(1.0 - cirrus, 1.0, fbm(pos.xyz / pos.y * 2.0 + time * 0.05)) * 0.3;
    fragColor.rgb = mix(fragColor.rgb, cloudInfo.rgb * extinction * 4.0, density * max(pos.y, 0.0));

    // Cumulus Clouds
    #if SIMPLIFIED_RENDERING == 0
    for (int i = 0; i < 2; i++)
    {
        //vec3 cpos = pos; cpos.y = smoothstep(-0.5, 1.5, pos.y); cpos.xz /= cpos.y; cpos *= 2.0;
        vec3 cpos = pos; cpos.y = smoothstep(-0.5, 1.5, pos.y); cpos.xz /= cpos.y;
        float density = smoothstep(1.0 - cumulus, 1.0, fbm((0.7 + float(i) * 0.01) * cpos + time * 0.3));
        fragColor.rgb = mix(fragColor.rgb, cloudInfo.rgb * extinction * density * 5.0, min(density, 1.0) * (max(pos.y, 0.0)));
    }
    #else
    {
        //vec3 cpos = pos; cpos.y = smoothstep(-0.5, 1.5, pos.y); cpos.xz /= cpos.y; cpos *= 2.0;
        vec3 cpos = pos; cpos.y = smoothstep(-0.5, 1.5, pos.y); cpos.xz /= cpos.y;
        float density = smoothstep(1.0 - cumulus, 1.0, fbm((0.7) * cpos + time * 0.3));
        fragColor.rgb = mix(fragColor.rgb, cloudInfo.rgb * extinction * density * 5.0, min(density, 1.0) * (max(pos.y, 0.0)));
    }
    #endif

    fragColor.a = (0.5 - csstep(-0.8, -0.0, -0.5, 0.3, pos.y));
    #if 0
        float n = fbm((pos.xyz) / pos.y * 2.0 + time * 0.05);
        fragColor = vec4(n, n, n, 1.0);
    #endif      

}