alixinne / glsl-lang

LALR parser for GLSL
https://alixinne.github.io/glsl-lang/glsl_lang/
BSD 3-Clause "New" or "Revised" License
23 stars 4 forks source link

fix(lang-lexer/inject): better support injection of pp. dirs. from multiple files #24

Closed AlexTMjugador closed 11 months ago

AlexTMjugador commented 1 year ago

As of now, injecting preprocessor directives found in included files back into the AST when using the v2-full lexer is not supported. Attempting to do so will trap the Directives::inject method in the following loop forever, as a different source file ID will cause neither directive_idx to be incremented or break to be reached:

https://github.com/vtavernier/glsl-lang/blob/881c0e472bfa990150c7a3fcd8b3efc787b66ccc/lang-lexer/src/v2_full/directives.rs#L134-L158

After giving the situation some thought, I think it's more appropriate to inject the preprocessor directives that we are currently capable of injecting, regardless of whether they come from a different file:

By now, I think it is clear that #pragma and #extension can be injected as they are without further ado. However, handling #version in combination with #include requires further work, in my view.

Let's consider a shader A that is valid on its own, but does not contain a void main() function (this is allowed by the GLSL specification, as long as the shader is linked together with another shader that defines this function). Now, consider a shader B that #includes shader A and defines void main(). Let's assume that the GLSL application compiles shader B. In this scenario, I'd argue that it makes sense for both A and B to have a #version directive in the beginning of their respective sources, because both are shaders, according to the specification. A could define a void main() function only if a preprocessor directive set by B before the #include is unset, allowing it to be linked as a completely standalone shader, where the #version directive can't be placed anywhere else. Remarkably, the specifications for #include extensions like ARB_shading_language_include do not mention how to handle multiple #version directives at all.

Given the above scenario, I think that a good solution is to make sure that Directives::inject only emits at most one #version directive, with its version and profile set to the highest version and most featureful profile among those specified in all #version directives. As far as I know, GLSL versions are backwards compatible, and even if they weren't, several #version directives are likely to violate the GLSL specification, so there would be no way to reconcile GLSL sources targeting incompatible GLSL versions anyway.

These changes let #pragma and #extension directives coming from included files be injected anyway. #version directives are injected too, but only one #version directive is ever injected, according to the technique stated in the previous paragraph.

(By the way, I appreciate your prompt PR reviews. As you can see, I'm taking these crates for a good spin, trying to improve what I find in the process. I think that Rust deserves a rock-solid GLSL parser library :heart:)

AlexTMjugador commented 1 year ago

While I think I do understand the issue, this seems like a rather complex case and I'm worried it could show up again later if refactor was to come.

Indeed, I had to scratch my head quite a bit to understand what was going on 😄

Do you think you could add a regression test for this issue so we make sure this is fixed and stays this way?

Sure, I could give it a shot! I didn't add a regression test because I wasn't sure how to go about it, given that the modified files don't contain any tests. What's the rough outline of how you'd like such test to look and where would you put it?

alixinne commented 1 year ago

Sure, I could give it a shot! I didn't add a regression test because I wasn't sure how to go about it, given that the modified files don't contain any tests. What's the rough outline of how you'd like such test to look and where would you put it?

They do not contain any tests because unfortunately those components are currently hard to unit test, since they use the SyntaxNode structures from rowan. I think the best course of action is to have the test in the global parser tests folder (https://github.com/vtavernier/glsl-lang/tree/master/lang/tests), and since it relies on multiple files these could be residing in https://github.com/vtavernier/glsl-lang/tree/master/lang/data/tests. It's not in the same crate as the changes but right now it's easier to get a full parsing context ready from the glsl-lang crate to target this bug.

AlexTMjugador commented 1 year ago

Alright, I can try to cobble up together an integration test. I'll let you know when I get to it :smile:

(As a completely unrelated side note, I recently came across the proptest crate. I think it can be very useful for testing components like the transpiler, but that's out of the scope for this PR anyway.)

alixinne commented 1 year ago

(As a completely unrelated side note, I recently came across the proptest crate. I think it can be very useful for testing components like the transpiler, but that's out of the scope for this PR anyway.)

This looks interesting, another thing that works well for compilers is fuzzing, and it's already setup (using afl) in this repository (here). It helped find a bunch of bugs in the v2 parser in early development, and from experience it's also pretty good at finding infinite loops (as well as panics). The hard part is getting an interesting corpus to run fuzzing on, which I haven't got around to yet.

AlexTMjugador commented 11 months ago

I went busy after these last comments, and then totally forgot about this PR until I opened another now :sweat_smile:

I will see if I can get some tests done soon; if not, and if anyone is interested in seeing this PR through to completion, please feel free to pick up this branch where I left it.

alixinne commented 11 months ago

Finally got some time to spend on this, I've added a regression test for the issue that this PR fixes.

If you rebase your PR on the latest master and cherry-pick e137fef72b4cbd302e6da276131b0afc666b2956 and df5f33cdae636b29490f16283f2aa042444fe005 we can get this merged and released!

AlexTMjugador commented 11 months ago

Sorry for the delay - I finally had time to get around the rebase and cherry-picking. Thank you a bunch for adding the tests. Let me know if you there is anything else left :heart: