Open Cazadorro opened 3 years ago
Agree that improvements can be made in this regard. I do receive some funding for maintaining this repo and I will try to make such improvements as time allows. I also welcome such improvements from others.
A couple suggestions until then:
The glslang github repository has a long history (git log -p) of features and extensions added to it. These commits can be used as an outline for doing similar additions. Find a recent extension similar to what you want to add and start coding.
There has been some recent additions along the line of your request. See the addition and use of the GL_EXT_spirv_intrinsics extension in #2625 and #2749. This allows for easier addition of "simple" extensions to glslang using new SPIR-V capabilities. By "simple" I mean extensions which can be expressed in terms of addition of new intrinsics, utilizing existing syntax.
I'll add that in my experience the way the codebase is formatted makes it quite hard to learn by interactively debugging (in MSVC in particular). The fact that lots of code as well as all single line functions in header files are collapsed onto single lines makes breakpoints and variable watching quite hard to work with consistently in MSVC due to how it requires you to step one line 'into' a function before you can get the context. Often when I'm trying to debug things the first thing I do is go through the codepaths I'm interested in and add some newlines into the functions so I can debug them more nicely.
I am open to getting rid of single-line functions. I dislike them for the same reason.
Feels like it'd be quick to do. Adjust the clang-format and force it to re-format the code. I can make a PR if you want to go ahead with this.
Can you give an example of one file / class / group of functions that you would like to change? Just so we are on the same page before you make a PR.
I attempted to try to see one more time if I could add operator my self, I managed to (I think?) successfully add a keyword into glslang, Below outlines that, and attempts to outline that process, and contains information I hope will fill some of the holes I outlined above about project information.
Creating Keywords:
You'll need to look at the following files:
MachineIndependent/Scan.cpp
Contains the keywords and scanning, (scanning is not done via lex, but instead manually)MachineIndependent/glslang.m4
to change grammar and create new expressions. Token literals are generated with Bison (output glslang_tab.cpp.h
, glslang_tab.cpp
for the tokens, and MachineIndependent/glslang.y
for the grammar file generation)Easiest way to get dependencies for bison and m4 is actually to just use Msys2 on windows (on linux you can just install), otherwise use https://github.com/lexxmark/winflexbison for more up-to-date binaries for bison on windows.
Adding a new terminal symbol (such as a keyword), simply add a new
%token <lex> [your terminal name here]
You'll also see
%union {
struct {
glslang::TSourceLoc loc;
union {
glslang::TString *string;
int i;
unsigned int u;
long long i64;
unsigned long long u64;
bool b;
double d;
};
glslang::TSymbol* symbol;
} lex;
struct {
glslang::TSourceLoc loc;
glslang::TOperator op;
union {
TIntermNode* intermNode;
glslang::TIntermNodePair nodePair;
glslang::TIntermTyped* intermTypedNode;
glslang::TAttributes* attributes;
glslang::TSpirvRequirement* spirvReq;
glslang::TSpirvInstruction* spirvInst;
glslang::TSpirvTypeParameters* spirvTypeParams;
};
union {
glslang::TPublicType type;
glslang::TFunction* function;
glslang::TParameter param;
glslang::TTypeLoc typeLine;
glslang::TTypeList* typeList;
glslang::TArraySizes* arraySizes;
glslang::TIdentifierList* identifierList;
};
glslang::TArraySizes* typeParameters;
} interm;
}
which basically says
Define a union of two structs:
to affect what the parsing of that terminal actually is, go to:
MachineIndependent/Scan.cpp
Here you will find where all the manual parsing being done (since glslang only uses bison, not flex). If you want to add a keyword, you'll need to edit
fillInKeywordMap()
and add your keyword:
(*KeywordMap)["your_keyword"] = [your KEYWORD];
in addition you'll need to add another case to tokenizeIdentifier()
, which is meant to handle code versions that don't include modern vulkan glsl (so just looking at valid keywords for any version of GLSL is not valid for all programs). Usually you'll do something like:
case [your KEYWORD]:
if (parseContext.spvVersion.vulkan > 0) {
return keyword;
}
Summarize organizational structure:
other things I noticed but didn't know where to put:
@greg-lunarg But after I added the operator
keyword, I noticed that making any kind of significant change to the grammar was going to be a frustrating experience.
A couple of things got in the way of adding operators to the grammar. I originally planned to have operator definitions simply be free functions that were part of look up resolution to unary and binary operators (at least at first, and expand them to be able to be used for conversions and built-ins). I thought the process would simply be: Create a operator function definition, add a partially mangled name to the list of functions to look up, something like ___operator__add(...)
. When an operator is encountered, instead of doing normal operator resolution, first check for the existence of an user defined operator, then let normal overload resolution work its magic here if found, otherwise continue with normal operator resolution. Effectively just syntactic sugar for functions.
I quickly found that what I thought would be simple "Function declaration" "Function Definition" expressions... didn't exist in a way I could easily discern what was what. There are way too many "function" expressions to make sense to me, the difference between a "function header", "function call header", "function prototype", "function declarator" and a "function declaration" are lost on me semantically. A lot of those terms are used interchangeably outside the context of this grammar, so it is not clear what is what.
In addition to the documentation of the code itself above, I feel it's important to document the grammar. If You need internal grammar structures for expression creation reasons, then that should be documented. For example, if you're trying to create a foo
expression, but need foo_with_x
and foo_but_y
, it would be helpful if declared like this at the top:
//explanation of what foo represents
%type <interm> foo
//why foo_with_x is needed
%type <interm> foo_with_x
//why foo_but_y is needed
%type <interm> foo_but_y
//probably a line break between the next thing
Can you give an example of one file / class / group of functions that you would like to change? Just so we are on the same page before you make a PR.
Forked to a new issue to avoid continuing to step on the core problem of this issue.
Could you also please look into fixing up the include paths? It's currently a nightmare integrating it into game engines...
There is glslang/Include
, good this is all standard and nice
Then there is glslang/Public
, which if the one source file in there is public, why isn't it in Include
?
Then there is SPIRV
and OGLCompilersDLL
which don't have include directories and instead has headers to mixed with source files...
Can we get one consistent include directory? Or at least an include directory inside SPIR-V
and the other folders so we know where to look for?
Honestly I would expect an ideal structure to just look like:
glslang/
include/glslang/
headers.h
src/
code.cpp
ogl_compiler/
include/ogl_compiler/
headers.h
src/
code.cpp
spirv/
include/spirv/
headers.h
src/
code.cpp
and then those of us that incorporate glsl, spirv, or ogl directly in our builds can add each include/
to the include path and call it a day.
There are currently many outstanding issues for feature requests for GLSL, and during the Vulkanize event, many people were curious about the status of such features. It was brought up that khronos doesn't have the bandwidth right now to spend a lot of time on this. However many of us could in theory aid with pull requests to this repo our selves, or otherwise aid in developing these features.
This however is not an easy task, it is not only hard to navigate the codebase, it's very difficult to understand. It speaks to the difficulty of working with the glslang codebase that many of such features have been implemented in compilers the transpile to GLSL from a superset of GLSL (including templates!) instead of extensions to glslang. I myself have opted to write transpilers instead of extending glslang when trying to add my own features because of just how difficult it is to understand where to even find where I'd add something.
A comment made in the Vulkan Discord talks about this difficulty and where it lies here:
Some sort of architectural overview, that shows where exactly certain language processing takes place would be really nice. where AST parsing takes place, semantic analysis, tokenization etc... (though I know some of this is in a separate parser generator [bison]? How to modify that for use in GLSL would also be good information) Information like "pitfalls", dos and don'ts top level comments per file helping understand the general idea of what each section does.
For example, how do I navigate the codebase as a completely new developer? Some of this information exists in the very bottom of the page here https://github.com/KhronosGroup/glslang#basic-internal-operation but it doesn't give near enough information to start actual substantive development. Heck it doesn't even answer questions like "Where does the compiler infrastructure even live?", or "What is inside all the top level directories?" As I go down in the glslang folder, I'm met with more questions most of the files I'm looking at don't have top level comments explaining exactly the purpose of the file that's there, the only comment is the liscense at the top. There are some comments inline with code, but they all assume that you understand the context of what is going on in the first place. Some of these files do have top level comments, but the information is important enough to be available outside the header as a part of a much wider context, such as in glslang/MachineIndependent/SymbolTable.h, and it still doesn't explain how the code is accomplishing the above tasks in the header documentation.
It would be extremely helpful to have examples of (maybe in separate example repos) of how extend glsl, for example:
Along similar lines It would also be helpful to have a sort of walkthrough of how your GLSL text goes through glslang, where each part of the text touches, when and why.
These things should be straight forward for the maintainers to demonstrate, as these things have already had to happen to even have glslang exist in the first place. Some sort of text instruction similar to the style of Kelidiscope would be great