Maybe Naga IR should include unresolved identifiers and applications

jimblandy commented 1 month ago

(I'm not sure about this at all, I just wanted to get some thoughts down. "Epistemic status: daydreaming")

Loosening up naga::Module to allow "unresolved" types, functions, and global variables might 1) simplify the WGSL front end, 2) make the WGSL front end faster, and 3) make Naga more useful for experimentation like #5791.

The idea would be to add new variants to TypeInner, Expression, and so on representing identifiers whose definitions we don't know yet, just strings. Validation would forbid these from being passed to backends, so that they wouldn't need to do more than ignore them. But these variants' presence would let people write much more interesting things that run prior to validation, including expermental stuff like #5791 (see https://github.com/gfx-rs/wgpu/pull/5791#issuecomment-2183088767).

If I recall correctly, the main reason the WGSL front end has an internal AST type, front::wgsl::parse::ast::TranslationUnit, that is then lowered to a Module by front::wgsl::lower, is that WGSL allows identifiers to be used before they're defined. This means that when the compiler sees an expression like x(y, z), it doesn't know whether that's:

a call to a function named x,
a construction of a struct type named x that has two members, or
a construction of a vec, because x is a type alias for vec2<f32>.

These are all represented quite differently in Module, so we have to wait until we have processed the entire source and have all definitions in hand before we can decide what Naga IR to produce for x(y, z).

It seems almost inescapable that a WGSL front end should have two passes like this. But if the first pass were able to produce Naga IR directly, using these "unresolved" variants, then resolution could simply be a matter of patching up references, following through on details like struct member indices in AccessIndex expressions, and generally replacing loose IR with something more specific. This should be much faster than lowering, which effectively performs a copy of the entire module's representation.

Then, it seems like this might make Naga's WGSL front end a lot more useful for progamming-in-the-large research: a linker system could ask Naga to parse modules, leaving external identifiers unresolved, and then patch up the Modules with definitions at link time, according to whatever semantics it liked. Type checking could be performed at validation time, after identifiers had been resolved, as it is now.

We already have some ideas about how to move constant evaluation later, so it wouldn't need to understand these unresolved variants.

Generally, our initial assumption in Naga's architecture has been that the best Module is one tailored for the backends' interests: anything that a backend isn't concerned with should be "boiled away" in the front ends, and never appear in a Module.

But we're seeing various places where this may be too restrictive:

Abstract types should never be seen by a backend, but incorporating them into ScalarKind was a huge help, and it hasn't caused backends much trouble.
Part of the motivation for introducing compaction was to allow validation to continue to be strict, but in https://github.com/gfx-rs/wgpu/pull/6308#pullrequestreview-2329595461 I'm wondering whether compaction throws away so much that we can't do the validation required by WGSL. (Not sure about this.)
It may make Module less useful for tools, as explained above.
It may be forcing more complexity on the WGSL front end, as explained above.

jimblandy commented 1 month ago

cc @stefnotch

stefnotch commented 1 month ago

This sounds very interesting, thank you for pinging me! That would make building a basic importing system that uses naga in the background a lot easier. Not just because we could compile and link modules separately, but because we could re-compile a single module, and re-link with everything else. Aka incremental compilation on the level of entire files.

I'll ask the other people that are working on WESL what their thoughts are.

teoxoy commented 1 month ago

I think it would be worth trying it out and see how it goes, as we have been moving in this direction with the addition of overrides where we now have two sets of rules for the IR (IR with overrides and IR without overrides). We should document what flavors are generated by frontends and which ones are required by backends. If we go with this more flexible IR, I think the approach in https://github.com/gfx-rs/wgpu/pull/6310 would be fine since WGSL natively supports overrides.

ncthbrt commented 1 month ago

From my perspective as someone who is building extensions to wgsl, that would be very useful.

One thing that might be important to prioritise if this change is made is the ability to provide custom source spans.

I'm going to open a separate issue for it but it is somewhat related so thought I'd mention it here.

The main reason for this is that extensions to the language need to map back errors from naga so they can be correlated with input source code.

gfx-rs / wgpu

Maybe Naga IR should include unresolved identifiers and applications #6359