WebAssembly / component-model

Repository for design and specification of the Component Model
Other
899 stars 75 forks source link

Support inline packages #313

Open tschneidereit opened 4 months ago

tschneidereit commented 4 months ago

@alexcrichton brought up the idea of supporting packages in-line in a file in addition to the package statement. This would help with bundling world definitions into single files as part of things like runtime / language SDKs, which is often very useful.

As a concrete example, wasmtime is considering to include binary representations of the two worlds it implements instead of the deps folder to ease artifact management. That however makes handling these artifacts more difficult in other ways, including simple human inspection.

If we were able to bundle all the wit files into a single one, containing inline packages, that'd be a much better solution.

The syntax for this seems pretty straightforward to me, but I might be missing something:

package wasi:http@0.2.0 {
  world proxy { ... }
  interface incoming-handler { ... }
  interface types { ... }
}

package wasi:random@0.2.0 {
  world imports { ... }
  interface random { ... }
  interface insecure { ... }
  interface insecure-seed { ... }
}
[... other packages as needed]
lukewagner commented 4 months ago

Yep, that use case and syntax makes sense to me. I think we can even encode the WIT syntax you wrote today as:

(component
  (type (export "wasi:http/proxy@0.2.0") ...)
  (type (export "wasi:http/incoming-handler@0.2.0") ...)
  (type (export "wasi:http/types@0.2.0") ...)
  (type (export "wasi:random/imports@0.2.0") ...)
  (type (export "wasi:random/random@0.2.0") ...)
  (type (export "wasi:random/insecure@0.2.0") ...)
  (type (export "wasi:random/insecure-seed@0.2.0") ...)
)

Rendering that back to WIT would then just group interface/world definitions by package. So this could be purely a WIT-level addition and not even require a change in encoding scheme.

Another closely-related use case that I think wants the same feature is "show me a single self-contained (no deps) .wit for a given component". Right now this isn't possible because you can't put multiple packages into a single .wit, so this feature would also unlock this use case. But there is an interesting question:

Let's say I'm rendering the WIT from a component.wasm that looks like:

;; component.wasm
(component
  (import "wasi:http/outgoing-handler" ...)
  (import "custom-callback" (func (param "s" string) (result string)))
  ...
  (export "wasi:http/incoming-handler" ...)
)

where I'm importing or exporting interfaces that aren't published anywhere and thus I'm targeting a world that isn't published anywhere. While we could just force the tool to invent a synthetic package and world name, we've generally tried hard not to have to invent spurious names. Instead, it seems like what I'd like to see is roughly (modulo bikeshedding):

package wasi:http {
  interface outgoing-handler { ... }
  interface incoming-handler { ... }
}

world {
  import wasi:http/outgoing-handler;
  import callback: func(s: string) -> string;
  export wasi:http/incoming-handler;
}

The idea is to allow a single unnamed world (or perhaps we could (re)introduce default world syntax...) instead of (exclusive with) requiring a package definition and a world name.

And then, going the other direction: when I'm authoring a custom component, this is the syntax I'd be able to write and feed into, e.g., wit-bindgen or cargo component or jco componentize. This also seems nice for tutorials, since it avoids the need to explain why the made up mypackage/myworld names don't mean anything so you can ignore them.

WDYT?

alexcrichton commented 4 months ago

I think it's a good goal to make names optional myself, and I agree that an optional-names-package, if it existed, would be perfect for "print the WIT of this component". There's at least a bit to figure out how it all interacts though because at some point a package annotation is required, for example in the case of:

interface foo {}

world {
    import foo; // what's the ID of this interface?
}

Also having an unnamed world within a package may not be desirable but having an unnamed world without a package seems reasonable.

tschneidereit commented 4 months ago

Yeah, it seems to me like this feature would bring WIT more up to parity with the binary encoding.

Regarding the world name, that seems like a bit of an orthogonal question? It seems like that problem exists with the current wasm-to-wit output just the same. And regardless, maybe we can treat it as orthogonal by for now having tools use the .wasm file's name and/or support passing in the world name as a parameter?

lukewagner commented 4 months ago

Also having an unnamed world within a package may not be desirable but having an unnamed world without a package seems reasonable.

Yes, I think it's a good idea to say you can have a package declaration XOR an unnamed world and trying to do both would be an error (analogous to if you had two conflicting package declarations).

When using an unnamed world, I think the ground truth that we need to reflect in the WIT syntax is: there are no interfacenames (we're in the plainname case of importname or exportname).

Thus, as a starting point, I think we could say that it's an error to have a named interface definition when you're also defining an unnamed world (just like having a package declaration is an error, and for the same reason). Thus, you'd have to use inline anonymous interface definitions (alongside package { ... } scoped interfaces), e.g.:

package wasi:http {
  interface incoming-handler { ... }
}
world {
  import wasi:http/incoming-handler;
  import foo: interface {
     ...
  }
}

Assuming we extend use appropriately so that unnamed interfaces can refer to resource types defined by preceding interfaces (as discussed in wit-bindgen/#822), I think this should be roughly co-expressive to what you can do in the WAT with plainnamed instance types.

However, this would force repetition in cases where WAT could define the instance type once and reuse it multiple times (via typeidx). At the moment, the only way to factor out this type info is to have a separate named interface definition. So what I've been wondering for a while now is if we need a new syntax that intentionally does not define a name that is part of the public interface but is instead just a private detail (like a typeidx) which is erased (or preserved only via custom-name section, similar to the situation with doc-comments).

Incidentally, this question also came up in a WAC setting, and the idea there was to use let for internal names (that ultimately turn into indices in the C-M and have no associated import or export name). So, I don't know if this is too weird, but maybe then a syntax to factor out any type definition that you would otherwise be forced to repeat inline is:

let foo = interface { ... }
world {
  import bar: foo;
  import baz: foo;
}

and to be clear, this would be the type for a component:

(component
    (type (instance ...))
    (import "bar" (instance (type 0)))
    (import "baz" (instance (type 0)))
    ...
)

Incidentally, the import bar: foo; syntax dovetails nicely with CM/#287: if foo refers to a named interface definition, you would get a plainname along with an implements attribute (thereby preserving the semantics of the interfacename), whereas if foo referred to an unnamed (let-bound) interface definition, you get just the plainname (no implements).

WDYT?

alexcrichton commented 4 months ago

I'd personally be tempted to not bite off too much here. Once we're in the realm of deduplicating types that feels pretty far afield of the OP here of inline packages, and AFAIK there's not really an issue with duplicating types today anyway.

(I'd also agree that a single anonymous world is a bit orthogonal from inline packages too)

So going back to inline packages and pondering a bit, some questions I can come up with (perhaps with obvious answers but figure I should list them out):

That's at least the initial questions I can think of.

This sort of gives rise to a first-class concept of merging packages. Right now the Rust tooling already handles that as it comes up in a number of other places, for example if you've got two object files both with bindings they might both refer to WASI things. More commonly the main module may refer to WASI but the adapter also refers to WASI. Right now the package merging step is pretty dumb and probably not the algorithm we'd want. With WIT packages being able to be defined inline we'd probably want to formally define what it means to merge packages.

lukewagner commented 4 months ago

Yes, I'm quite happy to not focus on de-duplicating types; I mostly just wanted to preemptively address the issue with the solution to the question of what to do with interfaces in the context of anonymous worlds. Anonymous worlds, do seem rather timely and closely related and enabled by nested packages, but I'm also fine if we want to punt on them as well.

What happens if package foo:bar { ... } is in two WIT files of the same package? (probably an error)

Given that what really matters is the distinctness of the names of the individual interface and world definitions inside a package (at least if we use the encoding I mentioned above), I think it would be fine to allow the same package to appear multiple times and they all effectively get unioned together, erroring if they intersect (like a C++ namespace).

Is a WIT package still (for now at least) required to have a package foo:bar; header in one of its files? (probably yes, for now)

If we use the encoding I mentioned above, the following two WIT documents would encode to the exact same thing:

package ns:foo;
interface i {}
package ns:bar {
  interface j {}
}
package whocares:whatever;
package ns:foo {
  interface i {}
}
package ns:bar {
  interface j {}
}

so why require the vestigial package statement in the latter case? Thus, a slightly looser rule could be: if you have any un-packaged interface or world definitions, you need a package declaration, but otherwise you don't. And thus in the latter WIT document, you could delete the first line (but in the former, you need the first line).

Are recursive packages supported?

I don't think it'd provide any value, so I'd go with no.

We'd probably want to document that a complete package { ... } is not required, for example if your WIT has package a:b { interface i1 { ... } } and another package has package a:b { interface i2 { ... } } I think that should be valid, not an error. [...] This sort of gives rise to a first-class concept of merging packages.

Agreed. This all sortof goes along with the "there can be multiple package definitions for the same package name as long as they don't intersect" suggestion I made in my first answer above.

alexcrichton commented 4 months ago

I think it would be fine to allow the same package to appear multiple times and they all effectively get unioned together, erroring if they intersect (like a C++ namespace).

Oh I didn't know that C++ worked like that! But yeah I think what you say makes sense, and especially if we define what it means to merge packages then it sort of naturally falls out without too much work.

Thus, a slightly looser rule could be: if you have any un-packaged interface or world definitions, you need a package declaration, but otherwise you don't.

That sounds reasonable to me, yeah

azaslavsky commented 3 months ago

Hi! I've been attempting to implement the narrow version of this (that is, no anonymous worlds, no attempt at addressing type duplication issues, etc) over the last few evenings.

It's been fairly tractable so far, though one consequence of this change is that the UnresolvedPackage struct's interface in wit-parser/lib.rs would need to change such that all of it's return types go from Result<UnresolvedPackage> to Result<Vec<UnresolvedPackage>>, since any given set of WIT file(s) can now return an arbitrary number of packages.

This spiders outward pretty quickly: by my rough count there are transitively a three dozen or so call sites that would need to change to accept a list of packages rather than a single instance. My work so far (most, though not all, tests passing) is at https://github.com/azaslavsky/wasm-tools/tree/component-model-313.

The basic setup right now has two "modes": explicit mode, where the top-level of the file is solely package ... {} declaration, and the existing "implicit" mode, where the file contains one or zero package ...; statements, with other declarations at the same level. Once we do the anonymous world extension mentioned above, some selective mixing can be allowed, but right now the explicit/implicit split happens as soon as possible when parsing.

Addressing @alexcrichton's questions above:

Is a WIT package still (for now at least) required to have a package foo:bar; header in one of its files?

I haven't altered this. "Implicit" mode packages have the same rules as before, and "explicit" mode of course don't need a standalone package ...; statement.

Are recursive packages supported? e.g. package foo:bar { package a:b { ... } }

Not in the current draft. This felt like it would add a lot complexity.

We'd probably want to document that a complete package { ... } is not required, for example if your WIT has package a:b { interface i1 { ... } } and another package has package a:b { interface i2 { ... } } I think that should be valid, not an error.

I went the other way on this: package definitions must be in a single block in the current draft. I found it hard to think of examples (besides the aforementioned C++ namespaces) where blocks in language syntax work this way, and it felt very unnatural to me. Also, it makes implementation easier to not worry about it during resolution, and just treat every package tree as its own, self-contained whole. If folks feel strongly about going the other way, I don't think the current setup blocks adding this capability later.

What happens if package foo:bar { ... } is in two WIT files of the same package?

This is admittedly untested in the current draft, but I think it just goes in the HashMap<PackageId>, and can be freely referenced by as many other packages as need be?

I'm going to polish up the above branch by resolving the remaining TODOs and adding some more tests, particularly of the parse-fail variety. This is my first contribution, so please feel free to look through what I have so far and let me know if I'm barking way up the wrong tree! :)

alexcrichton commented 2 months ago

Thanks for working on this @azaslavsky!

In terms of implementation I might recommend replacing Result<UnresolvedPackage> with Result<UnresolvedPackages> and moving all the helper methods to UnresolvedPackages instead of UnresolvedPackage. That'd probably still require a similar number of changes/refactorings but might make adding anonymous worlds in the future a bit easier by having a "package struct" that they can be placed into.

As for all the consequences of your implementation, they sound good to me 👍. Only allowing one package block is at least a conservative starting point and we can always relax it later as necessary.

What happens if package foo:bar { ... } is in two WIT files of the same package?

This is admittedly untested in the current draft, but I think it just goes in the HashMap, and can be freely referenced by as many other packages as need be?

I think that's ok yeah, I mostly want to make sure that these sorts of cases are handled and at the very least don't ignore what's written in WIT files. Whether it unions or errors I'm ambivalent about myself but sounds like it'll get handled one way or another so I think that's fine.