dhewm / dhewm3

dhewm 3 main repository
https://dhewm3.org/
GNU General Public License v3.0
1.74k stars 341 forks source link

GLSL backend #15

Open dhewg opened 12 years ago

dhewg commented 12 years ago

There're two GLSL branches floating around: 1) https://git.iodoom.org/~raynorpat/iodoom3/raynorpats-glsl_iodoom3/commits/master and a continuation 2) https://github.com/LogicalError/doom3.gpl/commits/master

Both are based on different trees, and I merged those a while back on top of my tree: http://static.hackmii.com/dhewg/0001-Add-GLSL-backend.patch

My lack of GL foo is disturbing, but maybe someone wants to finish this backend?

revelator commented 4 years ago

Aye fhDoom has been silent for some time now, would be pretty sad to let all his good work go to waste. Hope someone picks up on finishing it.

Allready adding stuff from TDM :) multicore enhancements for one i allready have in my test engine and it gives a nice boost. Also ported the AVX and AVX2 simd's, but since they are only used for light and shadow culling the boost is neglible ( might be different on other gfx cards ) but Doom3 runs smooth as butter after adding these so yay :).

Next would be the framebuffer and depthbuffer code, but these rely heavily on GLSL support so it might take a while.

Arl90 commented 4 years ago

Very exciting!, is this MHDoom?

ghost commented 4 years ago

Just in case you struggle to come up with a name, may we suggest BHoom3? ;)

revelator commented 4 years ago

My test engine is based on MHDoom aye :), i use it for checking out if something would potentially break other engines before adding modifications.

The stuff that works will eventually be added to dhewm too if daniel approves.

Hehe allmost sounds like an explosion that name :), if keeping in line with old doom modifications it could actually be called Boom3 :-P since there was a Boom engine based on the old Doom 1.

DanielGibson commented 4 years ago

The stuff that works will eventually be added to dhewm too if daniel approves.

Sure, I'm always happy about pull requests for improvements(*) and the changes your mentioned sound quite interesting! :-)

(*) As long as they don't break existing features/mods/... and as long as visual changes remain optional (=> only used if set in material or globally enabled via cvar). Ideally I'd like to keep the interface to the game.dll stable, but if a great improvement can't be done otherwise I'd be willing to break the game API for it.

revelator commented 4 years ago

Nothing game breaking as of yet :-) most changes relate to modernizing some of the older methods, like multithreading Simd optimizations and drawing the world with GLSL.

The hybrid backend lives quite fine with mixing in with ARB shaders unless a mod specifically requests the use of ARB interactions ( say if someone actually created a mod that uses parallax occlusion mapping ) in which case its possible to turn of the GLSL interactions.

The SMP multithreading changes from darkmod actually fixes an old bug Doom3 has had for some time with newer cards causing microstuttering, by lowering the timer resolution to a fraction. It runs smooth as butter after this change so this might actually be the first item you might consider.

The Simd changes just add a few newer extensions CPU's have gained over the years like AVX and AVX2. Atm the only code that uses these are for culling light and shadow interactions. Not sure how much gain this gives in the long run, but it seems Doom3 actually uses the CPU to make these calculations so it might be worth it on lower end machines.

I can upload a build of my test engine so you can draw your own conclusions.

DanielGibson commented 4 years ago

That sounds pretty cool!

Is usage of AVX(2) optional so older CPUs (and non-x86 CPUs) still work?

revelator commented 4 years ago

yup :)

revelator commented 3 years ago

Final version of the hybrid backend ->

===========================================================================

Doom 3 GPL Source Code
Copyright (C) 1999-2011 id Software LLC, a ZeniMax Media company.

This file is part of the Doom 3 GPL Source Code (?Doom 3 Source Code?).

Doom 3 Source Code is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

Doom 3 Source Code is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with Doom 3 Source Code.  If not, see <http://www.gnu.org/licenses/>.

In addition, the Doom 3 Source Code is also subject to certain additional terms.
You should have received a copy of these additional terms immediately following
the terms and conditions of the GNU General Public License which accompanied the
Doom 3 Source Code.  If not, please request a copy in writing from id Software
at the address below.

If you have questions concerning this license or the applicable additional terms,
you may contact in writing id Software LLC, c/o ZeniMax Media Inc., Suite 120,
Rockville, Maryland 20850 USA.

===========================================================================
*/

#include "sys/platform.h"
#include "renderer/VertexCache.h"
#include "renderer/tr_local.h"

/*
===========================================================================

DEFAULT GLSL SHADER

===========================================================================
*/
#define GLSL_VERSION_ATTRIBS \
    "#version 130\n"

#define GLSL_INPUT_ATTRIBS \
    "in vec4 attrTexCoords;\n" \
    "in vec3 attrTangents0;\n" \
    "in vec3 attrTangents1;\n" \
    "in vec3 attrNormal;\n" \
    "mat3x3 u_lightMatrix = mat3x3 (attrTangents0, attrTangents1, attrNormal);\n\n"

#define GLSL_UNIFORMS \
    "uniform vec4 u_light_origin;\n" \
    "uniform vec4 u_view_origin;\n" \
    "uniform vec4 u_color_modulate;\n" \
    "uniform vec4 u_color_add;\n" \
    "uniform mat2x4 u_diffMatrix;\n" \
    "uniform mat2x4 u_bumpMatrix;\n" \
    "uniform mat2x4 u_specMatrix;\n" \
    "uniform mat4x4 u_projMatrix;\n" \
    "uniform mat4x4 u_fallMatrix;\n" \
    "uniform sampler2D bumpImage;\n" \
    "uniform sampler2D lightFalloffImage;\n" \
    "uniform sampler2D lightProjectImage;\n" \
    "uniform sampler2D diffuseImage;\n" \
    "uniform sampler2D specularImage;\n" \
    "uniform vec4 u_constant_diffuse;\n" \
    "uniform vec4 u_constant_specular;\n\n"

#define GLSL_VARYINGS \
    "varying vec2 diffCoords;\n" \
    "varying vec2 bumpCoords;\n" \
    "varying vec2 specCoords;\n" \
    "varying vec4 projCoords;\n" \
    "varying vec4 fallCoords;\n" \
    "varying vec3 lightDir;\n" \
    "varying vec3 halfAngle;\n" \
    "varying vec4 Color;\n"

// these are our GLSL interaction shaders
#define interaction_vs \
    GLSL_VERSION_ATTRIBS \
    GLSL_INPUT_ATTRIBS \
    GLSL_UNIFORMS \
    GLSL_VARYINGS \
    "void main ()\n" \
    "{\n" \
    "   // we must use ftransform as Doom 3 needs invariant position\n" \
    "   gl_Position = ftransform ();\n" \
    "\n" \
    "   diffCoords = attrTexCoords * u_diffMatrix;\n" \
    "   bumpCoords = attrTexCoords * u_bumpMatrix;\n" \
    "   specCoords = attrTexCoords * u_specMatrix;\n" \
    "\n" \
    "   projCoords = gl_Vertex * u_projMatrix;\n" \
    "   fallCoords = gl_Vertex * u_fallMatrix;\n" \
    "\n" \
    "   Color = (gl_Color * u_color_modulate) + u_color_add;\n" \
    "\n" \
    "   vec3 OffsetViewOrigin = (u_view_origin - gl_Vertex).xyz;\n" \
    "   vec3 OffsetLightOrigin = (u_light_origin - gl_Vertex).xyz;\n" \
    "\n" \
    "   lightDir = OffsetLightOrigin * u_lightMatrix;\n" \
    "   halfAngle = (normalize (OffsetViewOrigin) + normalize (OffsetLightOrigin)) * u_lightMatrix;\n" \
    "}\n\n"

#define interaction_fs \
    GLSL_VERSION_ATTRIBS \
    GLSL_UNIFORMS \
    GLSL_VARYINGS \
    "void main ()\n" \
    "{\n" \
    "   vec3 normalMap = texture2D (bumpImage, bumpCoords).agb * 2.0 - 1.0;\n" \
    "   vec4 lightMap = texture2DProj (lightProjectImage, projCoords);\n" \
    "\n" \
    "   lightMap *= dot (normalize (lightDir), normalMap);\n" \
    "   lightMap *= texture2DProj (lightFalloffImage, fallCoords);\n" \
    "   lightMap *= Color;\n" \
    "\n" \
    "   vec4 diffuseMap = texture2D (diffuseImage, diffCoords) * u_constant_diffuse;\n" \
    "   float specularComponent = clamp ((dot (normalize (halfAngle), normalMap) - 0.75) * 4.0, 0.0, 1.0);\n" \
    "\n" \
    "   vec4 specularResult = u_constant_specular * (specularComponent * specularComponent);\n" \
    "   vec4 specularMap = texture2D (specularImage, specCoords) * 2.0;\n" \
    "\n" \
    "   gl_FragColor = (diffuseMap + (specularResult * specularMap)) * lightMap;\n" \
    "}\n\n"

/* 32 bit hexadecimal 0, BFG had this set to a negative value which is illegal on unsigned */
static const GLuint INVALID_PROGRAM = 0x00000000;

static GLuint u_light_origin = INVALID_PROGRAM;
static GLuint u_view_origin = INVALID_PROGRAM;

static GLuint u_color_modulate = INVALID_PROGRAM;
static GLuint u_color_add = INVALID_PROGRAM;

static GLuint u_constant_diffuse = INVALID_PROGRAM;
static GLuint u_constant_specular = INVALID_PROGRAM;

static GLuint u_diffMatrix = INVALID_PROGRAM;
static GLuint u_bumpMatrix = INVALID_PROGRAM;
static GLuint u_specMatrix = INVALID_PROGRAM;

static GLuint u_projMatrix = INVALID_PROGRAM;
static GLuint u_fallMatrix = INVALID_PROGRAM;

static GLuint rb_glsl_interaction_program = INVALID_PROGRAM;

/*
==================
RB_GLSL_MakeMatrix
==================
*/
static float *RB_GLSL_MakeMatrix( const float *in1 = 0, const float *in2 = 0, const float *in3 = 0, const float *in4 = 0 ) {
    static float m[16];

    if ( in1 ) {
        SIMDProcessor->Memcpy( &m[0], in1, sizeof( float ) * 4 );
    }

    if ( in2 ) {
        SIMDProcessor->Memcpy( &m[4], in2, sizeof( float ) * 4 );
    }

    if ( in3 ) {
        SIMDProcessor->Memcpy( &m[8], in3, sizeof( float ) * 4 );
    }

    if ( in4 ) {
        SIMDProcessor->Memcpy( &m[12], in4, sizeof( float ) * 4 );
    }
    return m;
}

/* Calculate matrix offsets */
#define DIFFMATRIX( ofs ) din->diffuseMatrix[ofs].ToFloatPtr ()
#define BUMPMATRIX( ofs ) din->bumpMatrix[ofs].ToFloatPtr ()
#define SPECMATRIX( ofs ) din->specularMatrix[ofs].ToFloatPtr ()
#define PROJMATRIX( ofs ) din->lightProjection[ofs].ToFloatPtr ()

/*
=========================================================================================

GENERAL INTERACTION RENDERING

=========================================================================================
*/

/*
==================
RB_ARB2_BindTexture
==================
*/
void RB_ARB2_BindTexture( int unit, idImage *tex ) {
    backEnd.glState.currenttmu = unit;
    qglActiveTextureARB( GL_TEXTURE0_ARB + unit );
    tex->Bind();
}

/*
==================
RB_ARB2_UnbindTexture
==================
*/
void RB_ARB2_UnbindTexture( int unit ) {
    backEnd.glState.currenttmu = unit;
    qglActiveTextureARB( GL_TEXTURE0_ARB + unit );
    globalImages->BindNull();
}

/*
==================
RB_ARB2_BindInteractionTextureSet
==================
*/
void RB_ARB2_BindInteractionTextureSet( const drawInteraction_t *din ) {
    // texture 1 will be the per-surface bump map
    RB_ARB2_BindTexture( 1, din->bumpImage );

    // texture 2 will be the light falloff texture
    RB_ARB2_BindTexture( 2, din->lightFalloffImage );

    // texture 3 will be the light projection texture
    RB_ARB2_BindTexture( 3, din->lightImage );

    // texture 4 is the per-surface diffuse map
    RB_ARB2_BindTexture( 4, din->diffuseImage );

    // texture 5 is the per-surface specular map
    RB_ARB2_BindTexture( 5, din->specularImage );
}

/*
==================
RB_GLSL_DrawInteraction
==================
*/
static void RB_GLSL_DrawInteraction( const drawInteraction_t *din ) {
    /* Half Lambertian constants */
    static const float whalf[] = { 0.0f, 0.0f, 0.0f, 0.5f };
    static const float wzero[] = { 0.0f, 0.0f, 0.0f, 0.0f };
    static const float wone[] = { 0.0f, 0.0f, 0.0f, 1.0f };

    // load all the vertex program parameters
    qglUniform4fv( u_light_origin, 1, din->localLightOrigin.ToFloatPtr() );
    qglUniform4fv( u_view_origin, 1, din->localViewOrigin.ToFloatPtr() );

    qglUniformMatrix2x4fv( u_diffMatrix, 1, GL_FALSE, RB_GLSL_MakeMatrix( DIFFMATRIX( 0 ), DIFFMATRIX( 1 ) ) );
    qglUniformMatrix2x4fv( u_bumpMatrix, 1, GL_FALSE, RB_GLSL_MakeMatrix( BUMPMATRIX( 0 ), BUMPMATRIX( 1 ) ) );
    qglUniformMatrix2x4fv( u_specMatrix, 1, GL_FALSE, RB_GLSL_MakeMatrix( SPECMATRIX( 0 ), SPECMATRIX( 1 ) ) );

    qglUniformMatrix4fv( u_projMatrix, 1, GL_FALSE, RB_GLSL_MakeMatrix( PROJMATRIX( 0 ), PROJMATRIX( 1 ), wzero, PROJMATRIX( 2 ) ) );
    qglUniformMatrix4fv( u_fallMatrix, 1, GL_FALSE, RB_GLSL_MakeMatrix( PROJMATRIX( 3 ), whalf, wzero, wone ) );

    /* Lambertian constants */
    static const float zero[4] = { 0.0f, 0.0f, 0.0f, 0.0f };
    static const float one[4] = { 1.0f, 1.0f, 1.0f, 1.0f };
    static const float negOne[4] = { -1.0f, -1.0f, -1.0f, -1.0f };

    switch ( din->vertexColor ) {
    case SVC_IGNORE:
        qglUniform4fv( u_color_modulate, 1, zero );
        qglUniform4fv( u_color_add, 1, one );
        break;

    case SVC_MODULATE:
        qglUniform4fv( u_color_modulate, 1, one );
        qglUniform4fv( u_color_add, 1, zero );
        break;

    case SVC_INVERSE_MODULATE:
        qglUniform4fv( u_color_modulate, 1, negOne );
        qglUniform4fv( u_color_add, 1, one );
        break;
    }

    // set the constant colors
    qglUniform4fv( u_constant_diffuse, 1, din->diffuseColor.ToFloatPtr() );
    qglUniform4fv( u_constant_specular, 1, din->specularColor.ToFloatPtr() );

    // set the textures
    RB_ARB2_BindInteractionTextureSet( din );

    // draw it
    RB_DrawElementsWithCounters( din->surf->geo );
}

/*
==================
RB_ARB2_DrawInteraction
==================
*/
static void RB_ARB2_DrawInteraction( const drawInteraction_t *din ) {
    // load all the vertex program parameters
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_LIGHT_ORIGIN, din->localLightOrigin.ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_VIEW_ORIGIN, din->localViewOrigin.ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_LIGHT_PROJECT_S, din->lightProjection[0].ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_LIGHT_PROJECT_T, din->lightProjection[1].ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_LIGHT_PROJECT_Q, din->lightProjection[2].ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_LIGHT_FALLOFF_S, din->lightProjection[3].ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_BUMP_MATRIX_S, din->bumpMatrix[0].ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_BUMP_MATRIX_T, din->bumpMatrix[1].ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_DIFFUSE_MATRIX_S, din->diffuseMatrix[0].ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_DIFFUSE_MATRIX_T, din->diffuseMatrix[1].ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_SPECULAR_MATRIX_S, din->specularMatrix[0].ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_SPECULAR_MATRIX_T, din->specularMatrix[1].ToFloatPtr() );

    // testing fragment based normal mapping
    if ( r_testARBProgram.GetBool() ) {
        qglProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 2, din->localLightOrigin.ToFloatPtr() );
        qglProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 3, din->localViewOrigin.ToFloatPtr() );
    }
    static const float zero[4] = { 0, 0, 0, 0 };
    static const float one[4] = { 1, 1, 1, 1 };
    static const float negOne[4] = { -1, -1, -1, -1 };

    switch ( din->vertexColor ) {
    case SVC_IGNORE:
        qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_MODULATE, zero );
        qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_ADD, one );
        break;

    case SVC_MODULATE:
        qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_MODULATE, one );
        qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_ADD, zero );
        break;

    case SVC_INVERSE_MODULATE:
        qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_MODULATE, negOne );
        qglProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_ADD, one );
        break;
    }

    // set the constant colors
    qglProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 0, din->diffuseColor.ToFloatPtr() );
    qglProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 1, din->specularColor.ToFloatPtr() );

    // set the textures
    RB_ARB2_BindInteractionTextureSet( din );

    // draw it
    RB_DrawElementsWithCounters( din->surf->geo );
}

/*
=============
RB_ARB2_SharedSurfaceSetup
=============
*/
static void RB_ARB2_SharedSurfaceSetup( const drawSurf_t *surf ) {
    // set the vertex pointers
    idDrawVert *ac = ( idDrawVert * ) vertexCache.Position( surf->geo->ambientCache );
    qglColorPointer( 4, GL_UNSIGNED_BYTE, sizeof( idDrawVert ), ac->color );
    qglVertexAttribPointerARB( 11, 3, GL_FLOAT, false, sizeof( idDrawVert ), ac->normal.ToFloatPtr() );
    qglVertexAttribPointerARB( 10, 3, GL_FLOAT, false, sizeof( idDrawVert ), ac->tangents[1].ToFloatPtr() );
    qglVertexAttribPointerARB( 9, 3, GL_FLOAT, false, sizeof( idDrawVert ), ac->tangents[0].ToFloatPtr() );
    qglVertexAttribPointerARB( 8, 2, GL_FLOAT, false, sizeof( idDrawVert ), ac->st.ToFloatPtr() );
    qglVertexPointer( 3, GL_FLOAT, sizeof( idDrawVert ), ac->xyz.ToFloatPtr() );
}

/*
=============
RB_ARB2_CreateDrawInteractions
=============
*/
static void RB_ARB2_CreateDrawInteractions( const drawSurf_t *surf ) {
    if ( !surf ) {
        return;
    }

    // perform setup here that will be constant for all interactions
    GL_State( GLS_SRCBLEND_ONE | GLS_DSTBLEND_ONE | GLS_DEPTHMASK | backEnd.depthFunc );

    // enable the vertex arrays
    qglEnableVertexAttribArrayARB( 8 );
    qglEnableVertexAttribArrayARB( 9 );
    qglEnableVertexAttribArrayARB( 10 );
    qglEnableVertexAttribArrayARB( 11 );
    qglEnableClientState( GL_COLOR_ARRAY );

    // check for enabled GLSL program first, if it fails go back to ARB
    if ( rb_glsl_interaction_program != INVALID_PROGRAM ) {
        // enable GLSL programs
        qglUseProgram( rb_glsl_interaction_program );

        // texture 0 is the normalization cube map for the vector towards the light
        if ( backEnd.vLight->lightShader->IsAmbientLight() ) {
            RB_ARB2_BindTexture( 0, globalImages->ambientNormalMap );
        } else {
            RB_ARB2_BindTexture( 0, globalImages->normalCubeMapImage );
        }

        // no test program in GLSL renderer
        RB_ARB2_BindTexture( 6, globalImages->specularTableImage );

        for ( /**/; surf; surf = surf->nextOnLight ) {
            // perform setup here that will not change over multiple interaction passes
            RB_ARB2_SharedSurfaceSetup( surf );

            // this may cause RB_ARB2_DrawInteraction to be executed multiple
            // times with different colors and images if the surface or light have multiple layers
            RB_CreateSingleDrawInteractions( surf, RB_GLSL_DrawInteraction );
        }

        // back to fixed (or ARB program)
        qglUseProgram( INVALID_PROGRAM );
    } else { // Do it the old way
        // enable ASM programs
        qglEnable( GL_VERTEX_PROGRAM_ARB );
        qglEnable( GL_FRAGMENT_PROGRAM_ARB );

        // texture 0 is the normalization cube map for the vector towards the light
        if ( backEnd.vLight->lightShader->IsAmbientLight() ) {
            RB_ARB2_BindTexture( 0, globalImages->ambientNormalMap );
        } else {
            RB_ARB2_BindTexture( 0, globalImages->normalCubeMapImage );
        }

        // bind the vertex program
        if ( r_testARBProgram.GetBool() ) {
            RB_ARB2_BindTexture( 6, globalImages->specular2DTableImage );

            qglBindProgramARB( GL_VERTEX_PROGRAM_ARB, VPROG_TEST );
            qglBindProgramARB( GL_FRAGMENT_PROGRAM_ARB, FPROG_TEST );
        } else {
            RB_ARB2_BindTexture( 6, globalImages->specularTableImage );

            qglBindProgramARB( GL_VERTEX_PROGRAM_ARB, VPROG_INTERACTION );
            qglBindProgramARB( GL_FRAGMENT_PROGRAM_ARB, FPROG_INTERACTION );
        }

        for ( /**/; surf; surf = surf->nextOnLight ) {
            // perform setup here that will not change over multiple interaction passes
            RB_ARB2_SharedSurfaceSetup( surf );

            // this may cause RB_ARB2_DrawInteraction to be exacuted multiple
            // times with different colors and images if the surface or light have multiple layers
            RB_CreateSingleDrawInteractions( surf, RB_ARB2_DrawInteraction );
        }

        // need to disable ASM programs again
        qglBindProgramARB( GL_VERTEX_PROGRAM_ARB, PROG_INVALID );
        qglBindProgramARB( GL_FRAGMENT_PROGRAM_ARB, PROG_INVALID );

        // back to fixed (or GLSL program)
        qglDisable( GL_VERTEX_PROGRAM_ARB );
        qglDisable( GL_FRAGMENT_PROGRAM_ARB );
    }

    // disable vertex arrays
    qglDisableVertexAttribArrayARB( 8 );
    qglDisableVertexAttribArrayARB( 9 );
    qglDisableVertexAttribArrayARB( 10 );
    qglDisableVertexAttribArrayARB( 11 );
    qglDisableClientState( GL_COLOR_ARRAY );

    // disable features
    RB_ARB2_UnbindTexture( 6 );
    RB_ARB2_UnbindTexture( 5 );
    RB_ARB2_UnbindTexture( 4 );
    RB_ARB2_UnbindTexture( 3 );
    RB_ARB2_UnbindTexture( 2 );
    RB_ARB2_UnbindTexture( 1 );

    backEnd.glState.currenttmu = -1;
    GL_SelectTexture( 0 );
}

/*
==================
RB_ARB2_InteractionPass
==================
*/
static void RB_ARB2_InteractionPass( const drawSurf_t *shadowSurfs, const drawSurf_t *lightSurfs ) {
    // save on state changes by not bothering to setup/takedown all the messy states when there are no surfs to draw
    if ( shadowSurfs ) {
        // these are allway's enabled since we do not yet use GLSL shaders for the shadows.
        qglEnable( GL_VERTEX_PROGRAM_ARB );
        qglBindProgramARB( GL_VERTEX_PROGRAM_ARB, VPROG_STENCIL_SHADOW );

        RB_StencilShadowPass( shadowSurfs );

        // need to disable ASM programs again, we do not check for GLSL here since we do not use it for shadows.
        qglBindProgramARB( GL_VERTEX_PROGRAM_ARB, PROG_INVALID );
        qglDisable( GL_VERTEX_PROGRAM_ARB );
    }

    if ( lightSurfs ) {
        RB_ARB2_CreateDrawInteractions( lightSurfs );
    }
}

/*
==================
RB_ARB2_DrawInteractions
==================
*/
void RB_ARB2_DrawInteractions( void ) {
    viewLight_t *vLight;

    GL_SelectTexture( 0 );

    // for each light, perform adding and shadowing
    for ( vLight = backEnd.viewDef->viewLights; vLight; vLight = vLight->next ) {
        backEnd.vLight = vLight;

        // do fogging later
        if ( vLight->lightShader->IsFogLight() ) {
            continue;
        }

        if ( vLight->lightShader->IsBlendLight() ) {
            continue;
        }

        // nothing to see here; these aren't the surfaces you're looking for; move along
        if ( !vLight->localInteractions &&
                !vLight->globalInteractions &&
                !vLight->translucentInteractions ) {
            continue;
        }

        // clear the stencil buffer if needed
        if ( vLight->globalShadows || vLight->localShadows ) {
            backEnd.currentScissor = vLight->scissorRect;

            if ( r_useScissor.GetBool() ) {
                qglScissor( backEnd.viewDef->viewport.x1 + backEnd.currentScissor.x1,
                            backEnd.viewDef->viewport.y1 + backEnd.currentScissor.y1,
                            backEnd.currentScissor.x2 + 1 - backEnd.currentScissor.x1,
                            backEnd.currentScissor.y2 + 1 - backEnd.currentScissor.y1 );
            }
            qglClear( GL_STENCIL_BUFFER_BIT );
        } else {
            // no shadows, so no need to read or write the stencil buffer
            // we might in theory want to use GL_ALWAYS instead of disabling
            // completely, to satisfy the invarience rules
            qglStencilFunc( GL_ALWAYS, 128, 255 );
        }

        // run our passes for global and local
        RB_ARB2_InteractionPass( vLight->globalShadows, vLight->localInteractions );
        RB_ARB2_InteractionPass( vLight->localShadows, vLight->globalInteractions );

        // translucent surfaces never get stencil shadowed
        if ( r_skipTranslucent.GetBool() ) {
            continue;
        }
        qglStencilFunc( GL_ALWAYS, 128, 255 );

        backEnd.depthFunc = GLS_DEPTHFUNC_LESS;
        RB_ARB2_CreateDrawInteractions( vLight->translucentInteractions );
        backEnd.depthFunc = GLS_DEPTHFUNC_EQUAL;
    }

    // disable stencil shadow test
    qglStencilFunc( GL_ALWAYS, 128, 255 );

    GL_SelectTexture( 0 );
}

//===================================================================================

typedef struct {
    GLenum          target;
    GLuint          ident;
    char            name[64];
} progDef_t;

static  const int   MAX_GLPROGS = 256;

// a single file can have both a vertex program and a fragment program
// removed old invalid shaders, ARB2 is default nowadays and we override the interaction shaders with GLSL anyway if availiable.
static progDef_t    progs[MAX_GLPROGS] = {
    {GL_VERTEX_PROGRAM_ARB,   VPROG_TEST, "test.vfp"},
    {GL_FRAGMENT_PROGRAM_ARB, FPROG_TEST, "test.vfp"},
    {GL_VERTEX_PROGRAM_ARB,   VPROG_INTERACTION, "interaction.vfp"},
    {GL_FRAGMENT_PROGRAM_ARB, FPROG_INTERACTION, "interaction.vfp"},
    {GL_VERTEX_PROGRAM_ARB,   VPROG_BUMPY_ENVIRONMENT, "bumpyEnvironment.vfp"},
    {GL_FRAGMENT_PROGRAM_ARB, FPROG_BUMPY_ENVIRONMENT, "bumpyEnvironment.vfp"},
    {GL_VERTEX_PROGRAM_ARB,   VPROG_AMBIENT, "ambientLight.vfp"},
    {GL_FRAGMENT_PROGRAM_ARB, FPROG_AMBIENT, "ambientLight.vfp"},
    {GL_VERTEX_PROGRAM_ARB,   VPROG_STENCIL_SHADOW, "shadow.vp"},
    {GL_VERTEX_PROGRAM_ARB,   VPROG_ENVIRONMENT, "environment.vfp"},
    {GL_FRAGMENT_PROGRAM_ARB, FPROG_ENVIRONMENT, "environment.vfp"},
    // additional programs can be dynamically specified in materials
};

/*
=================
R_LoadARBProgram
=================
*/
void R_LoadARBProgram( int progIndex ) {
    int     ofs;
    int     err;
    idStr   fullPath = "glprogs/";
    fullPath += progs[progIndex].name;
    char    *fileBuffer;
    char    *buffer;
    char    *start = '\0', *end;

    common->Printf( "%s", fullPath.c_str() );

    // load the program even if we don't support it, so
    // fs_copyfiles can generate cross-platform data dumps
    fileSystem->ReadFile( fullPath.c_str(), ( void ** ) &fileBuffer, NULL );

    if ( !fileBuffer ) {
        common->Printf( ": File not found\n" );
        return;
    }

    // copy to stack memory and free
    buffer = static_cast<char *>( _alloca( strlen( fileBuffer ) + 1 ) );
    strcpy( buffer, fileBuffer );
    fileSystem->FreeFile( fileBuffer );

    if ( !glConfig.isInitialized ) {
        return;
    }

    // submit the program string at start to GL
    if ( progs[progIndex].ident == 0 ) {
        // allocate a new identifier for this program
        progs[progIndex].ident = PROG_USER + progIndex;
    }

    // vertex and fragment programs can both be present in a single file, so
    // scan for the proper header to be the start point, and stamp a 0 in after the end
    if ( progs[progIndex].target == GL_VERTEX_PROGRAM_ARB ) {
        if ( !glConfig.ARBVertexProgramAvailable ) {
            common->Printf( ": GL_VERTEX_PROGRAM_ARB not available\n" );
            return;
        }
        start = strstr( ( char * ) buffer, "!!ARBvp" );
    }

    if ( progs[progIndex].target == GL_FRAGMENT_PROGRAM_ARB ) {
        if ( !glConfig.ARBFragmentProgramAvailable ) {
            common->Printf( ": GL_FRAGMENT_PROGRAM_ARB not available\n" );
            return;
        }
        start = strstr( ( char * ) buffer, "!!ARBfp" );
    }

    if ( !start ) {
        common->Printf( ": !!ARB not found\n" );
        return;
    }
    end = strstr( start, "END" );

    if ( !end ) {
        common->Printf( ": END not found\n" );
        return;
    }
    end[3] = 0;

    qglBindProgramARB( progs[progIndex].target, progs[progIndex].ident );
    qglGetError();

    qglProgramStringARB( progs[progIndex].target, GL_PROGRAM_FORMAT_ASCII_ARB, strlen( start ), ( unsigned char * ) start );

    err = qglGetError();
    qglGetIntegerv( GL_PROGRAM_ERROR_POSITION_ARB, ( GLint * ) &ofs );

    if ( err == GL_INVALID_OPERATION ) {
        const GLubyte *str = qglGetString( GL_PROGRAM_ERROR_STRING_ARB );

        common->Warning( "\nGL_PROGRAM_ERROR_STRING_ARB: %s\n", str );

        if ( ofs < 0 ) {
            common->Warning( "GL_PROGRAM_ERROR_POSITION_ARB < 0 with error\n" );
        } else if ( ofs >= ( int ) strlen( ( char * ) start ) ) {
            common->Warning( "error at end of program\n" );
        } else {
            common->Warning( "error at %i:\n%s", ofs, start + ofs );
        }
        return;
    }

    if ( ofs != -1 ) {
        common->Warning( "\nGL_PROGRAM_ERROR_POSITION_ARB != -1 without error\n" );
        return;
    }
    common->Printf( "\n" );
}

/*
==================
R_FindARBProgram

Returns a GL identifier that can be bound to the given target, parsing
a text file if it hasn't already been loaded.
==================
*/
int R_FindARBProgram( GLenum target, const char *program ) {
    int     i;
    idStr   stripped = program;

    stripped.StripFileExtension();

    // see if it is already loaded
    for ( i = 0; progs[i].name[0]; i++ ) {
        if ( progs[i].target != target ) {
            continue;
        }
        idStr   compare = progs[i].name;
        compare.StripFileExtension();

        if ( !idStr::Icmp( stripped.c_str(), compare.c_str() ) ) {
            return progs[i].ident;
        }
    }

    if ( i == MAX_GLPROGS ) {
        common->Error( "R_FindARBProgram: MAX_GLPROGS" );
    }

    // add it to the list and load it
    progs[i].ident = ( program_t ) 0;   // will be gen'd by R_LoadARBProgram
    progs[i].target = target;
    strncpy( progs[i].name, program, sizeof( progs[i].name ) - 1 );

    R_LoadARBProgram( i );

    common->Printf( "Finding program %s\n", program );
    return progs[i].ident;
}

/*
==================
GL_GetShaderInfoLog
==================
*/
static void GL_GetShaderInfoLog( GLuint sh, GLchar *src, bool isprog ) {
    static GLchar   infolog[4096];
    GLsizei         maxLength = 0;

    // terminate it.
    infolog[0] = 0;

    if ( isprog ) {
        qglGetProgramInfoLog( sh, 4095, &maxLength, infolog );
        qglDeleteProgram( sh );
    } else {
        qglGetShaderInfoLog( sh, 4095, &maxLength, infolog );
        qglDeleteShader( sh );
    }
    common->Printf( "Shader Source:\n\n%s\n\n%s\n\n", src, infolog );
}

/*
==================
GL_CompileShader
==================
*/
static bool GL_CompileShader( GLuint sh, GLchar *src ) {
    if ( sh && src ) {
        GLint result = GL_FALSE;

        qglGetError();

        qglShaderSource( sh, 1, ( const GLchar ** )&src, 0 );
        qglCompileShader( sh );
        qglGetShaderiv( sh, GL_COMPILE_STATUS, &result );

        if ( result != GL_TRUE ) {
            GL_GetShaderInfoLog( sh, src, false );
            return false;
        } else if ( qglGetError() != GL_NO_ERROR ) {
            GL_GetShaderInfoLog( sh, src, false );
        }
    }
    return true;
}

/*
==================
GL_ValidateProgramStatus
==================
*/
static bool GL_ValidateProgramStatus( GLuint prg, GLchar *src ) {
    GLint valid = GL_FALSE;

    // validate shader program
    qglValidateProgram( prg );
    qglGetProgramiv( prg, GL_VALIDATE_STATUS, &valid );

    // the program was invalid.
    if ( valid != GL_TRUE ) {
        GL_GetShaderInfoLog( prg, src, true );
        return false;
    }
    return true;
}

/*
==================
GL_LinkProgramStatus
==================
*/
static bool GL_LinkProgramStatus( GLuint prg, GLchar *src ) {
    GLint linked = GL_FALSE;

    // test link status
    qglLinkProgram( prg );
    qglGetProgramiv( prg, GL_LINK_STATUS, &linked );

    // the program did not link.
    if ( linked != GL_TRUE ) {
        GL_GetShaderInfoLog( prg, src, true );
        return false;
    }
    return true;
}

//===================================================================================

struct glsltable_t {
    GLuint slot;
    GLchar *name;
};

// doom actually emulates immediate function modes with quite a bit of the vertex attrib calls, like glVertex3f = attrPosition or glColor3/4f = attribColor etc.
// this is also the reason our first attempts at replacing them with vertex array pointers failed,
// because those index positions are not declared in the shader at all.
// the uncommented ones below are the ones missing from the shaders,
// i only left them in in case someone wanted to make an effort in that regard.
glsltable_t interactionAttribs[] = {
    /*{ 0, "attrPosition" },    // does not exist in shader
    { 2, "attrNormal" },        // ditto and we have two normal indexes (one is used to get texture coordinates for skyportals)
    { 3, "attrColor" },*/       // and neither does color sigh...
    { 8, "attrTexCoords" },
    { 9, "attrTangents0" },
    { 10, "attrTangents1" },
    { 11, "attrNormal" }
};

/*
==================
GL_CreateGLSLProgram

Checks and creates shader programs for GLSL
Modified to throw invalid program if ANYTHING! fails.
==================
*/
static GLuint GL_CreateGLSLProgram( GLchar *vssrc, GLchar *fssrc, glsltable_t *attribs, GLuint numattribs ) {
    GLuint  progid = 0;
    GLuint  vs = 0;
    GLuint  fs = 0;

    // check error
    qglGetError();

    // we got a source create a vertex shader for it.
    if ( vssrc ) {
        vs = qglCreateShader( GL_VERTEX_SHADER );
    }

    // we got a source create a fragment shader for it.
    if ( fssrc ) {
        fs = qglCreateShader( GL_FRAGMENT_SHADER );
    }

    // vertex shader failed to compile
    if ( vs && vssrc && !GL_CompileShader( vs, vssrc ) ) {
        // mark it as invalid
        common->Warning( "GL_CompileShader: vertex shader failed to compile\n" );
        return INVALID_PROGRAM;
    }

    // fragment shader failed to compile
    if ( fs && fssrc && !GL_CompileShader( fs, fssrc ) ) {
        // mark it as invalid
        common->Warning( "GL_CompileShader: fragment shader failed to compile\n" );
        return INVALID_PROGRAM;
    }
    progid = qglCreateProgram();

    if ( vs && vssrc ) {
        qglAttachShader( progid, vs );
    }

    if ( fs && fssrc ) {
        qglAttachShader( progid, fs );
    }

    // bind attrib index numbers
    // we could actually bind the emulated ones here as well and then vertex attribs should work.
    if ( attribs && numattribs ) {
        for ( GLuint i = 0; i < numattribs; i++ ) {
            qglBindAttribLocation( progid, attribs[i].slot, attribs[i].name );
        }
    }

    // GLSL vertex program linking failed.
    if ( vs && vssrc && !GL_LinkProgramStatus( progid, vssrc ) ) {
        // mark it as invalid
        common->Warning( "GL_LinkProgramStatus: vertex shader program failed to link\n" );
        return INVALID_PROGRAM;
    }

    // GLSL fragment program linking failed.
    if ( fs && fssrc && !GL_LinkProgramStatus( progid, fssrc ) ) {
        // mark it as invalid
        common->Warning( "GL_LinkProgramStatus: fragment shader program failed to link\n" );
        return INVALID_PROGRAM;
    }

    // GLSL vertex program validation failed.
    if ( vs && vssrc && !GL_ValidateProgramStatus( progid, vssrc ) ) {
        // mark it as invalid
        common->Warning( "GL_ValidateProgramStatus: vertex shader program is invalid\n" );
        return INVALID_PROGRAM;
    }

    // GLSL fragment program validation failed.
    if ( fs && fssrc && !GL_ValidateProgramStatus( progid, fssrc ) ) {
        // mark it as invalid
        common->Warning( "GL_ValidateProgramStatus: fragment shader program is invalid\n" );
        return INVALID_PROGRAM;
    }

    // Always detach shaders after a successful link.
    qglDetachShader( progid, vs );
    qglDetachShader( progid, fs );

    // delete shader sources
    qglDeleteShader( vs );
    qglDeleteShader( fs );

    return progid;
}

//===================================================================================

struct sampleruniforms_t {
    GLchar  *name;
    GLint   binding;
};

sampleruniforms_t rb_interactionsamplers[] = {
    { "bumpImage", 1 },
    { "lightFalloffImage", 2 },
    { "lightProjectImage", 3 },
    { "diffuseImage", 4 },
    { "specularImage", 5 }
};

/*
==================
GL_SetupSamplerUniforms
==================
*/
static void GL_SetupSamplerUniforms( GLuint progid, sampleruniforms_t *uniForms, GLuint numUniforms ) {
    // setup texture uniform locations - this is needed even on nvidia
    qglUseProgram( progid );

    for ( GLuint i = 0; i < numUniforms; i++ ) {
        qglUniform1i( qglGetUniformLocation( progid, uniForms[i].name ), uniForms[i].binding );
    }

    // No the below are not yet parts of any shader program, DONT move this !!!
    // The below are handled in RB_GLSL_DrawInteraction.
    qglUseProgram( INVALID_PROGRAM );

    // setup shader uniforms for the main renderer
    u_light_origin = qglGetUniformLocation( progid, "u_light_origin" );
    u_view_origin = qglGetUniformLocation( progid, "u_view_origin" );

    u_color_modulate = qglGetUniformLocation( progid, "u_color_modulate" );
    u_color_add = qglGetUniformLocation( progid, "u_color_add" );

    u_constant_diffuse = qglGetUniformLocation( progid, "u_constant_diffuse" );
    u_constant_specular = qglGetUniformLocation( progid, "u_constant_specular" );

    u_diffMatrix = qglGetUniformLocation( progid, "u_diffMatrix" );
    u_bumpMatrix = qglGetUniformLocation( progid, "u_bumpMatrix" );
    u_specMatrix = qglGetUniformLocation( progid, "u_specMatrix" );

    u_projMatrix = qglGetUniformLocation( progid, "u_projMatrix" );
    u_fallMatrix = qglGetUniformLocation( progid, "u_fallMatrix" );
    /* attrPosition attrNormal and attrColor could be set here */
}

/*
==================
GL_GetGLSLFromFile
==================
*/
static GLchar *GL_GetGLSLFromFile( const GLchar *name ) {
    idStr   fullPath = "glprogs130/";
    fullPath += name;
    GLchar  *fileBuffer;
    GLchar  *buffer;

    if ( !glConfig.isInitialized ) {
        return NULL;
    }
    fileSystem->ReadFile( fullPath.c_str(), reinterpret_cast<void **>( &fileBuffer ), NULL );

    if ( !fileBuffer ) {
        common->Printf( "%s: File not found, using internal shaders\n", fullPath.c_str() );
        return NULL;
    }

    // copy to stack memory
    buffer = reinterpret_cast<char *>( Mem_Alloc( strlen( fileBuffer ) + 1 ) );
    strcpy( buffer, fileBuffer );
    fileSystem->FreeFile( fileBuffer );

    common->Printf( "%s: loaded\n", fullPath.c_str() );

    return buffer;
}

/*
==================
R_ReloadARBPrograms_f
==================
*/
void R_ReloadARBPrograms_f( const idCmdArgs &args ) {
    common->Printf( "----- R_ReloadARBPrograms -----\n" );

    for ( int i = 0; progs[i].name[0]; i++ ) {
        R_LoadARBProgram( i );
    }

    // load GLSL interaction programs if enabled
    if ( r_testGLSLProgram.GetBool() ) {

        // according to khronos this might not actually delete the shader program.
        // even worse it will throw an error in case the program is 0 or some other obscure value.
        // so make sure it is actually active comming in here and only unload it when reloading shaders.
        if ( rb_glsl_interaction_program != INVALID_PROGRAM ) {
            qglDeleteProgram( rb_glsl_interaction_program );
        }

        // try to load from file, use internal shader if not available.
        GLchar *vs = GL_GetGLSLFromFile( "interaction.vsh" );
        GLchar *fs = GL_GetGLSLFromFile( "interaction.fsh" );

        // replace ARB interaction shaders with GLSL counterparts, it is possible to use external GLSL shaders as well.
        rb_glsl_interaction_program = GL_CreateGLSLProgram( ( vs != NULL ) ? vs : interaction_vs, ( fs != NULL ) ? fs : interaction_fs, interactionAttribs, sizeof( interactionAttribs ) / sizeof( interactionAttribs[0] ) );

        // free externally loaded vertex shader.
        if ( vs != NULL ) {
            Mem_Free( vs );
            vs = NULL;
        }

        // free externally loaded fragment shader.
        if ( fs != NULL ) {
            Mem_Free( fs );
            fs = NULL;
        }

        // if the shader did not run into problems load it up.
        if ( rb_glsl_interaction_program != INVALID_PROGRAM ) {
            // made sure shaders are valid coming in here
            common->Printf( "----- Using GLSL interactions -----\n" );
            GL_SetupSamplerUniforms( rb_glsl_interaction_program, rb_interactionsamplers, sizeof( rb_interactionsamplers ) / sizeof( rb_interactionsamplers[0] ) );
        } else {
            // no shit sherlock turn it off if it fails.
            common->Warning( "----- GLSL interactions failed to compile, reverting to ARB interactions -----\n" );
            r_testGLSLProgram.SetBool( false );
        }
    }
    common->Printf( "-------------------------------\n" );
}

/*
==================
R_ARB2_Init
==================
*/
void R_ARB2_Init( void ) {
    common->Printf( "---------- R_ARB2_Init ----------\n" );

    if ( !glConfig.ARBVertexProgramAvailable ||
            !glConfig.ARBFragmentProgramAvailable ||
            !glConfig.ARBShadingLanguageAvailable ) {
        common->Warning( "Not available.\n" );
        return;
    }
    common->Printf( "Available.\n" );
    common->Printf( "---------------------------------\n" );
}

Now works 100% tested it extensively. Need to create a new cvar r_testGLSLProgram but then you are good to go.

Seemlessly changes between ARB and GLSL with reloadARBPrograms now. No visual glitches anymore either.

revelator commented 3 years ago

PROG_INVALID can just be replaced by 0 forgot i added this.

revelator commented 3 years ago

P.s do not use the above code with unmodified doom3 source code it will not work, it was written specifically for dhewm3, in standard doom3 it causes all sorts of graphical glitches like viewport mirror effect. This is because it is only compatible with pure ARB2 but standard doom3 might actually use parts of the other backends like the R200 backend for AMD/ATI cards. In fact the shaders for those backends are loaded in draw_arb2.cpp even if the main arb2 renderer is not used. If you use a port like mhdoom it will however work fine though MH removed all the mapping tools so mappers wont get to enjoy it.

revelator commented 3 years ago

Main reason the backend was somewhat wonky at times before i fixed it was because MH forgot to detach the shaders when the program was compiled, neither one of us had much experience with GLSL at the time he wrote this code so he also made the mistake to try and forcefully kill the shader program when switching backend which btw is impossible :P and also not needed, since both backends use the same render chain. Tbh it might even be prudent to get rid of the glUseProgram(INVALID_PROGRAM); but i left it in because at the time my R9 390 did not like having a potentially dangling GLSL program running when switching to ARB2. Either way it does not seem to hurt anything so it will be up for discussion if it should be removed. I still cannot write shaders for the life of me but atleast im getting a good grasp of the backend features needed hehe.

revelator commented 3 years ago

Heres a shot from dhewm3 running sikkmod with my hybrid ARB2 GLSL backend

This ones from the expansion ->

https://ibb.co/Lrp4tr8

looks like a million dollars so pretty sad that i cannot release the ported sikkmod code :/

revelator commented 3 years ago

Tbh if we ever get hold of sikkpin we should probably remove the wonkier parts of his mod like ssao and soft shadows since those are the main culprits when it comes to heavy engine loads and also they have a tendency to break often. SSAO is especially vile in that regard since it draws the bounding boxes as semi transparent entities which looks ghastly when outside (you can actually see the skybox borders).

It could be fixed if someone could modify the shader to use the depth capture function i wrote together with stevel at darkmod instead of passing it through captureframe which is not really geared for this, but the performance hit would probably still be brutal unless we also add FBO support. In which case soft shadows would also be moot since we could then use shadow maps instead of shadow volumes.

DanielGibson commented 3 years ago

Looks cool, great work! Not sure when I'll merge it - I have a vague BGFX idea in the back of my head, and it uses a dialect of GLSL (so I'd probably use your work here as a base for that at least) and it would be unfortunate to have incompatible glsl shaders for dhewm3 in the wild.. but right now it's just an idea and I have no time for it at all. (Having a BGFX backend would be useful because it also works on platforms with bad or no OpenGL support, like macOS)

Regarding Sikkpin: I've tried to contact him by E-Mail in the past, without any success.. I hope that eventually someone turns up who knows him and is able to get in touch.

motorsep commented 3 years ago

I still don't get why not to just get a rid of ARB and replace renderer with one from Doom 3 BFG (RBDoom3 fork). Surely at this day and age someone can make sikkmod 2.0 using GLSL.

DanielGibson commented 3 years ago

To quote my old reply:

Just a note here: if this (or any other new rendering-backend) is gonna happen eventually, I'd like

  • rendering backends in DLLs so they can be switched without recompiling (ok, this is not a super-hard requirement)

  • automatic translation from ARB shader code to GLSL - I think this is the only feasible way to support Mods, that often also have their own shaders (this is a hard requirement)

I also want to preserve the look and feel of the game so the RBDoom3BFG renderer is not an option.

motorsep commented 3 years ago

I also want to preserve the look and feel of the game so the RBDoom3BFG renderer is not an option.

They are the same picture o.O (granted BFG had some tweaks, but there is nothing stopping you making it looks exactly like Doom 3 vanilla)

revelator commented 3 years ago

BGFX sounds like a cool project, one could also look into using openshadinglanguage from blender which compiles shaders to bytecode much like GLSL is compiled into ARB assembly internally in opengl.

Writing a decompiler for ARB assembly will probably be tough as hell, and i suspect that is why noone has done it before.

DanielGibson commented 3 years ago

To be clear, with ARB assembly I mean these assembly-like text shaders used by Doom3, not some GPU-internal bytecode.

I don't think the converter would be super hard, at least when sticking to the "standard" ARB shader syntax of ARB_vertex_program and ARB_fragment_program (and ignoring nvidias extensions like NV_gpu_program4), I gathered some information in https://github.com/dhewm/dhewm3/issues/15#issuecomment-491915857

I mean the grammar seems pretty simple and it can't do anything fancy, not even loops or if/else (unless using nvidias extensions, which I hope Doom3 and the common mods don't do) ..

revelator commented 3 years ago

I hope you are right :)

DanielGibson commented 3 years ago

I hope I ever have enough time to try it out =)

revelator commented 3 years ago

could upload my own compile to let you try it out :) or if you prefer the code is pretty much plug and go, just yank the codepiece above into draw_arb2.cpp make a cvar named r_testGLSLProgram in rendersystem_init.cpp and an extern for it in tr_local.h or locally in draw_arb2.cpp compile and fire up doom3.

Test it by using reloadARBPrograms after you enabled or disabled r_testGLSLProgram it should look pretty much equal but with better lightning if using the GLSL backend.

revelator commented 3 years ago

Btw. There was once a codepiece in MH's backend that would spit out ARB compatible shaders from the GLSL backend. Was pretty simple i seem to recall, but the other way around im not so sure about. The converted GLSL shaders ran quite fine in standard doom3 also.

Atm. im looking at replacing the old GLimp gamma correction with an internal GLSL version. Ill leave the old one intact as a fallback in case user has a very old card.

First test looks quite good tbh.

revelator commented 3 years ago

Ugh. just discovered something nasty... msvc cannot detect cpu intrinsics at runtime so if you actually want AVX or AVX2 optimization you would have to pass it to the compiler by hand since cmake does not have a native function for determining this either. So need to use /arch:AVX or /arch:AVX2 for those processors that support it, and dont enable this on those that dont or you will crash. The cpuid code in idtech4 is not enough since even if it detects AVX or AVX2 it cannot pass these macros to the compiler. So unless someone has some code for cmake to detect these at runtime it will be up to the user to enable these flags. Else we would have to rewrite the AVX code to inline simd assembly like it is done with SSE* and that is a pretty tall order.

DanielGibson commented 3 years ago

IIRC inline assembly is 32bit (x86, not x64) only in MSVC so it's (basically) useless?

Does AVX gain big speedups in the game? Shouldn't it be enough to set /arch:AVX(2) for SIMD_AVX.cpp and SIMD_AVX2.cpp?

(BTW, what C/C++ compiler does support runtime detection and dispatching? I think GCC doesn't support it either, so you gotta do the detection and calling the right function yourself.)

DanielGibson commented 3 years ago

According to https://stackoverflow.com/a/13639476 something like this should work:

if(cpu STREQUAL "x86" OR cpu STREQUAL "amd64" OR cpu STREQUAL "x86_64")
  if(MSVC)
    set_source_files_properties(idlib/math/Simd_AVX.cpp PROPERTIES COMPILE_FLAGS /arch:AVX)
  else() # GCC/clang
    set_source_files_properties(idlib/math/Simd_AVX.cpp PROPERTIES COMPILE_FLAGS -mavx) 
  endif()
endif()

(untested; there might be better ways to match the target CPU)

revelator commented 3 years ago

Aye you can get cmake to set the AVX flags but if your cpu does not support it it would crash the game :S (tried on an older intel core 2 if i set avx support on that one it would crash doom3) so what we need cmake to do would be checking capabilities with cpuid and then pass the correct flags to the solution since msvc does not support this internally, gcc does support this with -march=native so there is no need there.

Can use inline assembly with x64 though some asm structures are named differently that is why i ment that it might be quite troublesome to do.

The speed gain is not super great but noticeable on those cpu's that do support it, my guess is that TDM tried to squeeze every iota of extra speed out of idtech4 so that it would not run like total crap with all the new additions to there game.

DanielGibson commented 3 years ago

I don't know what you're doing with AVX or how TDM is using it (though it works fine on CPUs without AVX support so it can't be that essential).

But if you stick to how the existing Doom3 SIMD support works, there should be no crashes. There is that abstract idSIMDProcessor class in https://github.com/dhewm/dhewm3/blob/master/neo/idlib/math/Simd.h and generic code (class idSIMD_Generic : public idSIMDProcessor) implementing that with plain C++ in https://github.com/dhewm/dhewm3/blob/master/neo/idlib/math/Simd_Generic.h (and .cpp). Then there's classes for the SIMD backends (MMX, SSE, ...) that derive from idSIMD_Generic or other SIMD backends (for example, idSIMD_SSE is derived from idSIMD_MMX and idSIMD_SSE2 is derived from idSIMD_SSE), see https://github.com/dhewm/dhewm3/blob/master/neo/idlib/math/Simd_SSE.h (and .cpp) etc Those derived classes override some of the methods of their superclasses (the generic backend or the one for MMX or SSE or whatever) with ones for their specific instructions (like SSE2). When the game starts, idSIMD::InitProcessor() (https://github.com/dhewm/dhewm3/blob/master/neo/idlib/math/Simd.cpp#L76) checks what the users CPU actually supports and sets the globel idSIMDProcessor* SIMDProcessor to a suitable implementation. So all the code in Doom3 that wants to use SIMD optimization calls SIMDProcessor->MatX_LowerTriangularSolveTranspose() or whatever, which then uses an implementation suitable for their CPU - so if a CPU doesn't support SSE2, it gets the SSE or MMX or generic implementation.

So if you add idSIMD_AVX : public idSIMD_SSE3 that implements its methods in neo/idlib/math/Simd_AVX.cpp (and make sure to build that file with /arch:AVX or -mavx), it should work without breaking older CPUs.

revelator commented 3 years ago

The problem on msvc is that it sets that flag globally (atleast on msvc 2013) so all the code needs to support AVX not just the AVX code for shadow volumes in idlib no matter what idtechs CPUID returns.

Going to try with msvc 2017 and see what happens if enabled with the /arch:AVX flag there.

Might also be a borked msvc install though i somehow doubt that.

revelator commented 3 years ago

test.zip could you try this one and see what happens ? it is my test engine.

turol commented 3 years ago

@DanielGibson GCC does support runtime dispatch. They call it function multiversioning. I'm not sure if it's possible to also apply different autovectorization options to different functions but I suspect it can be done with the optimization control pragmas.

revelator commented 3 years ago

Hmm setting the avx flag with msvc 2017 does indeed work even if used on older hardware as opposed to the older msvc 2013, not sure what changed in later msvc compilers.

DanielGibson commented 3 years ago

@turol: Oh nice, I didn't know about that GCC feature!

@revelator: glad to hear it works with VS2017

I just remembered that using AVX2 can be dangerous: Several generations of Intel CPUs clock down when encountering AVX2 instructions (or, for some newer generations, "heavy" AVX2 instructions), and additionally have some kind of delay when doing the downclocking: https://gist.github.com/rygorous/32bc3ea8301dba09358fd2c64e02d774

This can significantly slow down games (both because of the delay and because of clocking the CPU down will outweigh any speedup AVX gets you unless like 99% of your code is optimized AVX code which is not realistic for games). Sounds like one must be super careful when using AVX :-/

revelator commented 3 years ago

Knights end or AVX 512 is even worse in that regard :P

revelator commented 3 years ago

Old AVX or AVX1 works just fine my cpu cannot handle AVX2 so i cannot benchmark it (core i7 3930k) but the AVX1 path does give a few extra FPS.

Biggest difference came from the SMP changes in darkmod while not giving any uber FPS boost it certainly helped in regards to the game feeling sluggish at times with some of the heavier mods.

I could actually keep FPS at 60 with all but softshadows on in sikkmod at 1920x1080x32.

Sadly i havent gotten around to port the SMP changes to Doom3 again after github destroyed my previous working code port :S

revelator commented 3 years ago

Im also going to try porting BFG's multithreading to old idtech4, should get pretty interresting as Doom3 as is only uses 2 threads at max. One for the main engine and one for background downloads (kinda an odd place).

DanielGibson commented 3 years ago

Doom3 also has an extra thread for (mostly) soundmixing (this AsyncTimer() thing frequently calling common->Async()), which is a lot less useful when using OpenAL

But yeah, almost all of the work is done in the main thread.

An extra thread for background downloads totally makes sense though, so things can be downloaded in the background without blocking the mainthread

revelator commented 3 years ago

That makes sense but does it use that many ressources ?.

Hmm since it seems to at some point have being planned by id (loads of disabled code for it), im going to do a test and put the render backend on it's own thread.

DanielGibson commented 3 years ago

It's not about using CPU resources, it's about waiting for I/O

Remember that OpenGL doesn't really like threads - that's the reason they didn't put the renderer in its own thread, https://fabiensanglard.net/doom3/renderer.php has details on that

Especially https://fabiensanglard.net/doom3/interviews.php#qrenderer

Interestingly, we only just found out last year why it was problematic (the same thing applied to Rage’s r_useSMP option, which we had to disable on the PC) – on windows, OpenGL can only safely draw to a window that was created by the same thread. We created the window on the launch thread, but then did all the rendering on a separate render thread. It would be nice if doing this just failed with a clear error, but instead it works on some systems and randomly fails on others for no apparent reason.

The Doom 4 codebase now jumps through hoops to create the game window from the render thread and pump messages on it, but the better solution, which I have implemented in another project under development, is to leave the rendering on the launch thread, and run the game logic in the spawned thread.

revelator commented 3 years ago

Hehe yeah i read about async :) its not even multithreading in the usual sense as you can run any number of processes on the same core with it. Looking at BFG's code it was kinda understandable as it takes quite a bit more code to split things between the cores.

Been a while since i toyed with intels tools for finding out where in the code we might get a benefit for doing that, will be interresting to see what it finds.

Read most of fabians docs but that one slipped past me... ugh

DanielGibson commented 3 years ago

Oh wait, "It's not about using CPU resources, it's about waiting for I/O" was about the download thread

Regarding sound mixing: I can imagine that it indeed is quite expensive when mixing lots of sound sources in software, and doing it in an extra thread wasn't that hard apparently, so I think it made sense (though the way they implemented it was pretty bad and resulted in up to 100ms delay until a sound actually started playing, which made machine guns sound like they stutter; I fixed that recently).

revelator commented 3 years ago

Cool i think i allready stumbled upon your fix in the code.

revelator commented 3 years ago

Ok so async does best with I/O while sync does best when we dont have to care about I/O as it can run the tasks in parallel. Damn i have some work to do xD

revelator commented 3 years ago

enabling smp also act's up in quake 4 i noticed... things look mighty weird with it on atleast.