bytecodealliance / wasm-tools

CLI and Rust libraries for low-level manipulation of WebAssembly modules
Apache License 2.0
1.21k stars 224 forks source link

Enable binary encoding and printing of multi-package wit files #1609

Open macovedj opened 3 weeks ago

macovedj commented 3 weeks ago

Wit files that leverage multiple packages should be able to be encoded into binary and printed back out with the changes in this PR

alexcrichton commented 2 weeks ago

Ok reading over this some more I don't think that this is going to work as-is. It looks like this is basically adding a few loops in a few places, but I think that the changes required are going to be more significant than that. For example each interface/world in a package, when encoded today, is exported as the kebab-cased-name of the interface/world. Each kebab-cased export is a standalone component type which has full context for just that interface/world.

With multiple packages something will need to change about this format. Otherwise it's not possible to encode two packages which contains interface foo for example since that would generate two kebab-cased exports named "foo". I haven't thought much about what the format should be, so I don't have an answer off the top of my head.

alexcrichton commented 2 weeks ago

cc @lukewagner for possible brainstorming on the format here. The question being that today we have:

$ wasm-tools component wit -t <<< "package a:b; interface foo {}"
(component
  (type (;0;)
    (component
      (type (;0;)
        (instance)
      )
      (export (;0;) "a:b/foo" (instance (type 0)))
    )
  )
  (export (;1;) "foo" (type 0))
  (@custom "package-docs" "\00{}")
  (@producers
    (processed-by "wit-component" "0.210.0")
  )
)

What's the best way to represent this WIT package:

package a:b1 { interface foo {} }
package a:b2 { interface foo {} }

I'm tempted to say:

(component
  (component $a:b1 (; output of `wasm-tools component wit -t for just `package a:b1` ;) )
  (component $a:b2 (; output of `wasm-tools component wit -t for just `package a:b2` ;) )

  (export "a:b1" (component $a:b1))
  (export "a:b2" (component $a:b2))
)

This isn't currently valid though as "a:b1" isn't a valid export string, right now it needs to be fully qualified. This could be seen as motivation for relaxing those restrictions, but that's balooning the change a good deal so I'm curious if you have thoughts on this.

lukewagner commented 2 weeks ago

Ooh yeah, really good question. Agreed that the natural first step seems to be to put each nested WIT package into its own nested component. Then to your question: maybe this is weird but: what if we just didn't export those nested WIT packages at all? That would imply that when WIT tooling was slurping up a binary WIT package, it would need to recurse through all nested components and populate its internal namespace with all interfacename-d type exports of exported component types (rather than only doing so for the root component). Would that work?

What's neat about this is that it preserves the distinction between:

package ns:a;
package ns:b { interface i {} }
interface i {}

and

package ns:a { interface i {} }
package ns:b { interface i {} }

The former you could say is defining the package ns:a (which exports i) and bundling ns:b as an external dependency whereas the latter is defining an empty unnamed package that bundles external packages ns:a and ns:b. I think that distinction makes sense?

alexcrichton commented 2 weeks ago

I think that could work yeah and that seems reasonable to me. For the start I think it'd just walk the root component looking for subcomponents rather than spidering the whole thing, but either probably wouldn't be too hard to implement. I'm thinking though it'd be best to only accept a conservative shape of components to keep our options open in the future to different organizations if we want.

Otherwise though this WIT:

package ns:a;
package ns:b { interface i {} }
interface i {}

currently isn't valid where WIT text files are either package foo:bar; mode or exclusively package foo:bar { ... } mode. That being said being able to express the distinction there eventually I think is good regardless if we want to support it in the future.

lukewagner commented 2 weeks ago

Ah right, good point; unless and until we add package { ... } blocks inside package { ... } blocks, you shouldn't need to spider.

macovedj commented 2 weeks ago

So if I'm following correctly then for the scope of this PR, since mixed package declaration styles isn't yet supported, then in Alex's example where we have:

package a:b1 { interface foo {} }
package a:b2 { interface foo {} }

this would end up encoded with no exports:

(component
  (component $a:b1 (; output of `wasm-tools component wit -t for just `package a:b1` ;) )
  (component $a:b2 (; output of `wasm-tools component wit -t for just `package a:b2` ;) )
)

Is that right?

Then whenever we end up with mixed declaration styles being supported (maybe I"ll do that as a follow up or if it doesn't seem so bad I may try to include it in this PR) we could see both this and the current day encodings appear side by side in a single encoded wit file.

alexcrichton commented 2 weeks ago

Indeed! So the encoder will basically say if there's one package do what it does today but with multiple packages being encoded they're encoded in this new format. Similarly the decoding process would need to recognize the new multi-package format and decode that in addition to supporting the previous single-package format.

There's still sort of a question of defaults, but I think that's reasonable to tackle at a later date. Specifically I'm wondering for example what the output of wasm-tools component wit would be for wasi:cli. Whether that by default transitively includes all dependency packages I'm not sure.

macovedj commented 2 weeks ago

So I have implemented these changes locally for encoding multiple wit packages. When decoding though, I found that some modifications will be have to made here as the current tooling will try to parse the binary as a component instead of a wit package if no exports are present.

I think a convention needs to be decided on how to discern whether these are nested components in a component implementation vs being used to describe a wit package? I was thinking that maybe the component names can can provide a mechanism? If not then we could go back to expecting exports for these cases, and the convention could be established in how they're exported?

macovedj commented 2 weeks ago

After some more thought... I'll actually probably investigate refactoring ComponentInfo into ComponentsInfo, with multiple packages in it. Then we could perform the existing check on each package, and it could also provide a clearer route to mixing package declaration styles.

alexcrichton commented 2 weeks ago

That's a good point! I agree though that we can probably update how metadata is tracked and extend the "is empty" check to exclude the case of "no exports but has a bunch of components which all look like WIT packages"

macovedj commented 1 week ago

Cool. I think I've essentially accomplished the thrust of what's involved in order to do to that now (locally)... As I'm tidying it up before requesting additional review, I'll be evaluating whether or not to include support for mixing package declaration styles or have that be a follow up.

macovedj commented 6 days ago

Hmm seems I botched something up switching branches... should be able to look into what went wrong shortly...

macovedj commented 1 day ago

I think this should be ready for review now... just investigating why it's struggling to get gcc with yum in CI/CD.