alixinne / glsl-lang

LALR parser for GLSL
https://alixinne.github.io/glsl-lang/glsl_lang/
BSD 3-Clause "New" or "Revised" License
23 stars 4 forks source link

Parser seems to swallow all preprocessor directives #54

Open moritzvoelker opened 9 hours ago

moritzvoelker commented 9 hours ago

When trying to parse some shader code, I noticed that any preprocessor directives were missing from the AST.

Minimal reproducible example

fn main() {
    use glsl_lang::{
        ast::TranslationUnit,
        parse::DefaultParse,
        transpiler::glsl::{show_translation_unit, FormattingSettings},
    };

    // Define and parse shader code
    let source = "#version 460\n#define TEST 3\nvoid main() {\n}\n";
    let ast = TranslationUnit::parse(&source).unwrap();

    // Directly print it out again
    let mut transpiled = String::new();
    show_translation_unit(
        &mut transpiled,
        &ast,
        (&FormattingSettings::default()).into(),
    )
    .unwrap();
    println!("{transpiled}");
}

Expected behavior

This program should print out the shader code as it was before, it should look like this:

#version 460
#define TEST 3
void main() {
}

Actual behavior

The actual output looks like this:

void main() {
}

The preprocessor directives just get swallowed. If I add them manually to the ast like so:

fn main() {
    use glsl_lang::{
        ast::{
            ExternalDeclarationData, IdentifierData, Node, PreprocessorData,
            PreprocessorDefineData, PreprocessorVersionData, TranslationUnit,
        },
        parse::DefaultParse,
        transpiler::glsl::{show_translation_unit, FormattingSettings},
    };

    let source = "#version 460\n#define TEST 3\nvoid main() {\n}\n";
    let mut ast = TranslationUnit::parse(&source).unwrap();

    let version = Node::new(
        ExternalDeclarationData::Preprocessor(Node::new(
            PreprocessorData::Version(Node::new(
                PreprocessorVersionData {
                    version: 460,
                    profile: None,
                },
                None,
            )),
            None,
        )),
        None,
    );
    let define = Node::new(
        ExternalDeclarationData::Preprocessor(Node::new(
            PreprocessorData::Define(Node::new(
                PreprocessorDefineData::ObjectLike {
                    ident: Node::new(IdentifierData("TEST".into()), None),
                    value: String::from("3"),
                },
                None,
            )),
            None,
        )),
        None,
    );
    ast.0.insert(0, version);
    ast.0.insert(1, define);

    let mut transpiled = String::new();
    show_translation_unit(
        &mut transpiled,
        &ast,
        (&FormattingSettings::default()).into(),
    )
    .unwrap();
    println!("{transpiled}");
}

Then the preprocessor directives are printed like expected, so this should really be a problem with parsing and not transpiling/printing.

Non bug explainations

Maybe there is a feature flag or a parsing setting that I have overlooked, that just en-/disables the parsing of preprocessor directives, but I couldn't find anything.

I hope this helps in any way :)

alixinne commented 2 hours ago

Thanks for the detailed bug report! Unfortunately there isn't an easy answer to this, but I'll still try to explain.

The main reason this is an issue is that the main crate, glsl-lang, only represents an AST (Abstract Syntax Tree) for historical reasons. I added position information to the nodes in the AST, so technically with node positions and the original source you could rebuild the source, including comments (they are stripped by the preprocessing/lexing stage), etc.

The transpiling part is based solely on the AST (again, historical reasons) so as you noticed with your second example, if the preprocessor directives are present in the AST, they'll get printed out as expected.

The issue is, the AST enforces a rigid structure that follows GLSL's specification, which means having those Preprocessor external declarations (i.e. top-level declaration) is more a hack than a faithful representation of the spec. If we want accurate parsing (per the spec) we need to follow the whole lexer -> preprocessor -> parser pipeline. However, this staging is both critical to the preprocessor functioning as expected, but also why we can't propagate preprocessor directives into the AST (at least in the current form).

Your example is an easy answer: we can parse it into the following AST:

TranslationUnit
+ VersionDirective
+ DefineDirective
+ FunctionDecl
  + Block

However, preprocessor directives can be placed literally anywhere:

#version 460 core
void main()
#define TEST 3
{
}

This would result in the same AST after preprocessing, but needs an entirely different data structure to allow for this (something more along the lines of a concrete syntax tree which supports preprocessing, so it does need some thought to get it correct).

Aside from preprocessor directives in random places, they could also be essential to parsing. As a contrived but technically legal by the spec example, this should parse to the same AST as well:

#version 460 core
#define BEGIN {
#define END }
void main()
BEGIN
END

To parse this correctly we need a preprocessing step to happen before parsing, but we'd also need to carry a lot of extra information to keep the transformation lossless when transpiling.

Another issue with dealing with concrete syntax trees is that they're harder to mutate than the current AST in the glsl-lang crate, and the results of mutations might be undefined depending on what preprocessor directives were used to generate this input.

As a workaround, what's supported is to inject pre-processor directives into the AST after parsing (see https://github.com/alixinne/glsl-lang/blob/master/lang-cli/src/main.rs for a full example). This only supports:

Changing your example a bit to use this results in the following:

fn main() {
    use std::path::Path;

    use glsl_lang::{
        lexer::v2_full::fs::PreprocessorExt,
        parse::IntoParseBuilderExt,
        transpiler::glsl::{show_translation_unit, FormattingSettings},
    };

    // Define and parse shader code
    let source = "#version 460\n#define TEST 3\nvoid main() {\n}\n";

    let mut processor = glsl_lang_pp::processor::fs::StdProcessor::new();
    let ast = processor
        .open_source(
            source,
            Path::new("inline.glsl")
                .parent()
                .unwrap_or_else(|| Path::new(".")),
        )
        .builder()
        .parse()
        .map(|(mut tu, _, iter)| {
            iter.into_directives().inject(&mut tu);
            tu
        }).unwrap();

    // Directly print it out again
    let mut transpiled = String::new();
    show_translation_unit(
        &mut transpiled,
        &ast,
        (&FormattingSettings::default()).into(),
    )
    .unwrap();
    println!("{transpiled}");
}

With the transpiled result being:

#version 460
void main() {
}

Maybe this is enough for your use case?