Cazadorro commented 3 years ago

There are currently many outstanding issues for feature requests for GLSL, and during the Vulkanize event, many people were curious about the status of such features. It was brought up that khronos doesn't have the bandwidth right now to spend a lot of time on this. However many of us could in theory aid with pull requests to this repo our selves, or otherwise aid in developing these features.

This however is not an easy task, it is not only hard to navigate the codebase, it's very difficult to understand. It speaks to the difficulty of working with the glslang codebase that many of such features have been implemented in compilers the transpile to GLSL from a superset of GLSL (including templates!) instead of extensions to glslang. I myself have opted to write transpilers instead of extending glslang when trying to add my own features because of just how difficult it is to understand where to even find where I'd add something.

A comment made in the Vulkan Discord talks about this difficulty and where it lies here:

I'm not honestly sure you need to refactor code as much as explain where exactly I would change things to make new feature adjustments. It's basically impossible to know what effects what in that repo. Even something as simple as adding a new built-in, where would I even do that? I'd expect a simple "a file with a set of built ins processed, and SPIR-V mapped to those names", but that doesn't appear to be the case. There needs to be something more a long the lines of an architectural overview, some high level thing that tells people exactly where things exist, and where you would make changes to add features, and proper documentation of the code explaining things

Some sort of architectural overview, that shows where exactly certain language processing takes place would be really nice. where AST parsing takes place, semantic analysis, tokenization etc... (though I know some of this is in a separate parser generator [bison]? How to modify that for use in GLSL would also be good information) Information like "pitfalls", dos and don'ts top level comments per file helping understand the general idea of what each section does.

For example, how do I navigate the codebase as a completely new developer? Some of this information exists in the very bottom of the page here https://github.com/KhronosGroup/glslang#basic-internal-operation but it doesn't give near enough information to start actual substantive development. Heck it doesn't even answer questions like "Where does the compiler infrastructure even live?", or "What is inside all the top level directories?" As I go down in the glslang folder, I'm met with more questions most of the files I'm looking at don't have top level comments explaining exactly the purpose of the file that's there, the only comment is the liscense at the top. There are some comments inline with code, but they all assume that you understand the context of what is going on in the first place. Some of these files do have top level comments, but the information is important enough to be available outside the header as a part of a much wider context, such as in glslang/MachineIndependent/SymbolTable.h, and it still doesn't explain how the code is accomplishing the above tasks in the header documentation.

It would be extremely helpful to have examples of (maybe in separate example repos) of how extend glsl, for example:

How to change the grammar
How to add a keyword
How to add a built-in function
How to add a new type
How to add a an operator
How to add a new statement type
How to reconcile these new features with the reflection capabilities of glslang.

Along similar lines It would also be helpful to have a sort of walkthrough of how your GLSL text goes through glslang, where each part of the text touches, when and why.

These things should be straight forward for the maintainers to demonstrate, as these things have already had to happen to even have glslang exist in the first place. Some sort of text instruction similar to the style of Kelidiscope would be great

greg-lunarg commented 3 years ago

Agree that improvements can be made in this regard. I do receive some funding for maintaining this repo and I will try to make such improvements as time allows. I also welcome such improvements from others.

A couple suggestions until then:

The glslang github repository has a long history (git log -p) of features and extensions added to it. These commits can be used as an outline for doing similar additions. Find a recent extension similar to what you want to add and start coding.
There has been some recent additions along the line of your request. See the addition and use of the GL_EXT_spirv_intrinsics extension in #2625 and #2749. This allows for easier addition of "simple" extensions to glslang using new SPIR-V capabilities. By "simple" I mean extensions which can be expressed in terms of addition of new intrinsics, utilizing existing syntax.

mbechard commented 3 years ago

I'll add that in my experience the way the codebase is formatted makes it quite hard to learn by interactively debugging (in MSVC in particular). The fact that lots of code as well as all single line functions in header files are collapsed onto single lines makes breakpoints and variable watching quite hard to work with consistently in MSVC due to how it requires you to step one line 'into' a function before you can get the context. Often when I'm trying to debug things the first thing I do is go through the codepaths I'm interested in and add some newlines into the functions so I can debug them more nicely.

greg-lunarg commented 3 years ago

I am open to getting rid of single-line functions. I dislike them for the same reason.

mbechard commented 3 years ago

Feels like it'd be quick to do. Adjust the clang-format and force it to re-format the code. I can make a PR if you want to go ahead with this.

greg-lunarg commented 3 years ago

Can you give an example of one file / class / group of functions that you would like to change? Just so we are on the same page before you make a PR.

Cazadorro commented 3 years ago

I attempted to try to see one more time if I could add operator my self, I managed to (I think?) successfully add a keyword into glslang, Below outlines that, and attempts to outline that process, and contains information I hope will fill some of the holes I outlined above about project information.

example:

Creating Keywords:

You'll need to look at the following files:

MachineIndependent/Scan.cpp Contains the keywords and scanning, (scanning is not done via lex, but instead manually)
MachineIndependent/glslang.m4 to change grammar and create new expressions. Token literals are generated with Bison (output glslang_tab.cpp.h, glslang_tab.cpp for the tokens, and MachineIndependent/glslang.y for the grammar file generation)

Easiest way to get dependencies for bison and m4 is actually to just use Msys2 on windows (on linux you can just install), otherwise use https://github.com/lexxmark/winflexbison for more up-to-date binaries for bison on windows.

Adding a new terminal symbol (such as a keyword), simply add a new

%token <lex> [your terminal name here]

You'll also see and . Basically a union is defined in bison (top of .m4 and .y file):

%union {
    struct {
        glslang::TSourceLoc loc;
        union {
            glslang::TString *string;
            int i;
            unsigned int u;
            long long i64;
            unsigned long long u64;
            bool b;
            double d;
        };
        glslang::TSymbol* symbol;
    } lex;
    struct {
        glslang::TSourceLoc loc;
        glslang::TOperator op;
        union {
            TIntermNode* intermNode;
            glslang::TIntermNodePair nodePair;
            glslang::TIntermTyped* intermTypedNode;
            glslang::TAttributes* attributes;
            glslang::TSpirvRequirement* spirvReq;
            glslang::TSpirvInstruction* spirvInst;
            glslang::TSpirvTypeParameters* spirvTypeParams;
        };
        union {
            glslang::TPublicType type;
            glslang::TFunction* function;
            glslang::TParameter param;
            glslang::TTypeLoc typeLine;
            glslang::TTypeList* typeList;
            glslang::TArraySizes* arraySizes;
            glslang::TIdentifierList* identifierList;
        };
        glslang::TArraySizes* typeParameters;
    } interm;
}

which basically says

Define a union of two structs:

If the union represents the first struct it is 'lex' (terminal symbol), which represents
- a source location (where it is in the code)
- One and only one of a set number of literals (string, i32, u32, i64, u64, bool, or double).
If the union represents the second struct, it is 'interm' (intermediate node), which represents:
- a source location (where it is in the code)
- an operator (if it has one, ie if it is a binary operator node)
- One and only one of a set number of different intermediate node types (another intermediate node, pair, typed, attributes etc...
- One and only one of a set number of typed auxiliary information (type, function, param etc...).

to affect what the parsing of that terminal actually is, go to:

MachineIndependent/Scan.cpp

Here you will find where all the manual parsing being done (since glslang only uses bison, not flex). If you want to add a keyword, you'll need to edit

fillInKeywordMap()

and add your keyword:

 (*KeywordMap)["your_keyword"] = [your KEYWORD];

in addition you'll need to add another case to tokenizeIdentifier(), which is meant to handle code versions that don't include modern vulkan glsl (so just looking at valid keywords for any version of GLSL is not valid for all programs). Usually you'll do something like:

 case [your KEYWORD]:
    if (parseContext.spvVersion.vulkan > 0) {
        return keyword;
    }

Summarize organizational structure:

bison files are glslang.m4 (what you actually edit), which generates glslang.y, which is then used to generate glslang_tab.cpp and glslang_tab.cpp.h
ParseHelper.h/.cpp, for helper base classes for parser. ParseContextBase for the object passed around to bison that we get our information from parsing from.
Scanning is dones in Scan.cpp, Scan.h, and ScanContext.h, both are included in Scan.cpp.

other things I noticed but didn't know where to put:

where parse context is the higher scoped "context" object which stores meta information about the compilation process (target language version, errors, etc...).
initialize.cpp builds the symbol table of built-ins etc, fills in default, derivative, and extension built-ins.

Cazadorro commented 3 years ago

@greg-lunarg But after I added the operator keyword, I noticed that making any kind of significant change to the grammar was going to be a frustrating experience.

A couple of things got in the way of adding operators to the grammar. I originally planned to have operator definitions simply be free functions that were part of look up resolution to unary and binary operators (at least at first, and expand them to be able to be used for conversions and built-ins). I thought the process would simply be: Create a operator function definition, add a partially mangled name to the list of functions to look up, something like ___operator__add(...). When an operator is encountered, instead of doing normal operator resolution, first check for the existence of an user defined operator, then let normal overload resolution work its magic here if found, otherwise continue with normal operator resolution. Effectively just syntactic sugar for functions.

I quickly found that what I thought would be simple "Function declaration" "Function Definition" expressions... didn't exist in a way I could easily discern what was what. There are way too many "function" expressions to make sense to me, the difference between a "function header", "function call header", "function prototype", "function declarator" and a "function declaration" are lost on me semantically. A lot of those terms are used interchangeably outside the context of this grammar, so it is not clear what is what.

In addition to the documentation of the code itself above, I feel it's important to document the grammar. If You need internal grammar structures for expression creation reasons, then that should be documented. For example, if you're trying to create a foo expression, but need foo_with_x and foo_but_y, it would be helpful if declared like this at the top:

//explanation of what foo represents
%type <interm> foo
//why foo_with_x is needed
%type <interm> foo_with_x
//why foo_but_y is needed
%type <interm> foo_but_y
//probably a line break between the next thing

mbechard commented 3 years ago

Can you give an example of one file / class / group of functions that you would like to change? Just so we are on the same page before you make a PR.

Forked to a new issue to avoid continuing to step on the core problem of this issue.

marstaik commented 2 years ago

Could you also please look into fixing up the include paths? It's currently a nightmare integrating it into game engines...

There is glslang/Include, good this is all standard and nice Then there is glslang/Public, which if the one source file in there is public, why isn't it in Include? Then there is SPIRV and OGLCompilersDLL which don't have include directories and instead has headers to mixed with source files...

Can we get one consistent include directory? Or at least an include directory inside SPIR-V and the other folders so we know where to look for?

Honestly I would expect an ideal structure to just look like:

glslang/
    include/glslang/
        headers.h
    src/
        code.cpp

ogl_compiler/
    include/ogl_compiler/
        headers.h
    src/
        code.cpp

spirv/
    include/spirv/
        headers.h
    src/
        code.cpp

and then those of us that incorporate glsl, spirv, or ogl directly in our builds can add each include/ to the include path and call it a day.

KhronosGroup / glslang

Please provide a better path for glslang contributors #2784

example: