Open lukewagner opened 1 year ago
Nice write-up! The cargo component example really makes sense and makes it easy for developers to express configs. I have a few questions regarding the later part of the write-up in this proposal:
*: func() -> string
? If not, I'd be interested to understand what *: func(param1, param2, ...) -> (ret1, ret2)
would mean for a configuration. ConfigMap
in Kubernetes? This looks great @lukewagner , thanks for capturing all of this!
+1 to not encoding the rule * would only be allowed inside an interface block as the only item
directly into the grammar. Plenty of languages allow more expressiveness per their grammar while enforcing particular constraints during parse.
I might not be making the connection here, but how is the RHS string
type of the parameter in Cargo.toml
used during the substitution, for instance when substituting in the above *: func()->string
?
Ah, I think my Cargo.toml example was confusing. Because wasi:config/values
as-written has a fixed string
result type, there is no right-hand-side necessary. Since TOML doesn't allow empty right-hand-side, maybe I should have written this instead:
[package.metadata.component.target]
path = "my-world.wit"
[package.metadata.component.parameters]
config = [ "a", "b", "c" ]
What I wrote would make sense once type parameters were possible and wasi:config/values
was generalized interface { *: func() -> _ }
), so I sortof jumped straight there in my example, where I'm substituting string
for the _
.
I think that answers @fibonacci1729 and partially @Mossaka's question 1. Answering the rest:
The *
should be usable with any function type (the "string"
in the Cargo.toml was a red herring).
For JS, I think you'd write the analogous thing but in package.json
. I can't speak to the Python though.
Yes, great point! I think we could definitely have a language-agnostic TOML (or JSON or ...) file that was fed into wit-bindgen. Maybe this is even our starting point, so it sortof sets up a "reference" format, and then Cargo.toml and package.json are just developer conveniences that avoid one extra config file.
FYI, @fibonacci1729 and I have this pretty much implemented; we'll demo it at tomorrow's Component Model meeting, then clean things up and start opening PRs. I'm sure @alexcrichton will have opinions :)
https://github.com/bytecodealliance/wasm-tools/compare/main...fermyon:wasm-tools:wit-templates https://github.com/bytecodealliance/wit-bindgen/compare/main...dicej:wit-templates https://github.com/bytecodealliance/wasmtime/compare/main...dicej:wasmtime:wit-templates
I unfortunately won't be able to make the meeting tomorrow, but I look forward to digging into these bits on Monday!
First draft PR for feedback: https://github.com/bytecodealliance/wasmtime/pull/5925 I anticipate that one will generate the most discussion, since there are a variety of ways host bindings could be generated, and I picked the one that seemed most flexible to me. @Kylebrown9 FYI
@lukewagner Brian and I ran into what seems to be a pretty fundamental issue while implementing this.
Consider this world:
interface foo {
*: func() -> u32
}
default world wildcards {
import imports1: self.foo
import imports2: self.foo
}
Now imagine we want to build a component that targets that world, using ["a", "b", "c"]
to expand imports1
's wildcard and ["x", "y", "z"]
to expand imports2
's wildcard. AFAICT there's currently no way to represent that component type, whether as WIT or in binary -- we can say foo
has functions a
, b
, and c
or we can say it has functions x
, y
, and z
, but we can't say both at the same time.
Of course, we could split foo
into two separate interfaces, but how would we name them, and how would you do a subtype comparison between the original WIT template and the concretized WIT?
I just realized my example above is rejected by wasm-tools
anyway since you can't import the same interface more than once (I'd be curious to know the reason for that, BTW; EDIT: Brian pointed me to https://github.com/bytecodealliance/wit-bindgen/issues/529 for context). It does accept importing and exporting the same interface, though, so it's still a problem in that scenario.
Also, perhaps I was incorrect in claiming there's no way to represent the component type of the "concretized" world, given that wasm-tools component wit -t wildcards.wit
currently duplicates interface types when generating a component type, so you could imagine a substitutions-aware version of wasm-tools
doing something like this:
input wildcards.wit:
interface foo {
*: func() -> u32
}
default world wildcards {
import imports: self.foo
export exports: self.foo
}
input substitutions.toml:
[wildcards]
imports = ["a", "b", "c"]
exports = ["x", "y", "z"]
output concrete.wat:
(component
(type (;0;)
(component
(type (;0;)
(instance
(type (;0;) (func (result u32)))
;; TODO: this part won't validate -- still need proper representation for wildcards:
;; (export (;0;) "*" (func (type 0)))
)
)
(export (;0;) "foo" "pkg:/wildcards/foo" (instance (type 0)))
(type (;1;)
(component
(type (;0;)
(instance
(type (;0;) (func (result u32)))
(export (;0;) "a" (func (type 0)))
(export (;0;) "b" (func (type 0)))
(export (;0;) "c" (func (type 0)))
)
)
(import "imports" "pkg:/wildcards/foo" (instance (type 0)))
(type (;1;)
(instance
(type (;0;) (func (result u32)))
(export (;0;) "x" (func (type 0)))
(export (;0;) "y" (func (type 0)))
(export (;0;) "z" (func (type 0)))
)
)
(export (;0;) "exports" "pkg:/wildcards/foo" (instance (type 1)))
)
)
(export (;0;) "wildcards" "pkg:/wildcards/wildcards" (component (type 1)))
)
)
(export (;1;) "wildcards" "pkg:/wildcards" (type 0))
)
The above component validates just fine, but there's no way we could convert it back into WIT, since we've given an inconsistent definition of pkg:/wildcards/foo
. So I guess you could argue this is just a WIT limitation, but even though the above component validates, I'd still consider it malformed given the inconsistency. In the end, I think we'd want to change both the component model and WIT if we want to support instantiating the same interface more than once in a given world.
So where does that leave us? I see two ways forward:
Assuming we do want to modify WIT and the CM to enable multiple instantiation of wildcard interfaces, I think it might be helpful to compare it to the existing "genericity" in WIT/CM: list
, result
, option
, and tuple
. Note that none of those things are types -- they're type constructors, i.e. they are "functions" take a type (or more than one in the cases of result
and tuple
) and produce a type. In Haskell parlance, list
has kind * -> *
, while u8
has kind *
.
Analogously, I would suggest that an interface containing a wildcard is not an interface at all -- it's an interface constructor, i.e. a "function" that takes a list of names and produces an interface. And just as you can't say type foo = list
, I'd contend that you shouldn't be able to say import imports: self.foo
if foo
has a wildcard because you can't instantiate an interface constructor -- you need to monomorphize it first. And I think the key to allowing multiple instantiation of wildcard interfaces is to make an explicit distinction between interfaces and interface constructors in both WIT and the CM.
We can bikeshed the syntax, but for the moment let's imagine we used a similar syntax for interface constructors (and world constructors!) to what we use for type constructors:
// Polymorphic interface constructor:
interface foo<W...> {
<W...>: func() -> u32 // expand W into zero or more functions of this type
}
// Monomorphic world: needs no substitutions
world monomorphic {
import imports: self.foo<a, b, c>
export exports: self.foo<x, y, z>
}
// Polymorphic world constructor; substitutions needed to monomophize:
world polymorphic<W1..., W2...> {
import imports: self.foo<W1>
import imports: self.foo<W2>
}
The main virtue of this syntax is that you can see at a glance which things are worlds or interfaces (monomorphic) vs. which ones are world or interface constructors (polymorphic). And crucially for our purposes, we can invoke a given interface or world constructor multiple times without ambiguity.
To be clear, I'm not necessarily advocating for that specific syntax -- just that we make the distinction between interfaces/worlds and interface/world constructors clear and unambiguous in both WIT and the CM, so we can round-trip between them losslessly -- both before and after monomorphization. This will be even more important when we introduce type holes (which are really just type parameters to interface/world constructors).
Nice job digging into this. I think you're right that foo
in your example isn't simply an interface. I'm not sure whether it's technically a "type constructor" or a row-polymorphic type, but in any case, we could say that it's an "interface template" that can be instantiated multiple times. I'll need to think some more about how this should look as a component type in Wat. However, I don't necessarily think we need to make the Wit format any more verbose than in the original example; it seems like these type parameters should be implicit (either always or at least by default).
Other than the commented-out bit about the "*"
export, I think the above Wit is valid because the interface template isn't directly used by any subsequent types; is that right?
I generally agree with the discussion here and that these interface templates are not themselves interfaces, but I don't know that that means they need a different syntax then what was initially proposed. I think trying to enumerate all of the parameters used in templates at the beginning of the declaration will become quite verbose.
If the main concern is going backwards to wit, I think the simplest answer (thought this may break/test other assumptions/constraints) would be to generate a different concrete interface definition that references the original template for each unique parameterization.
In our current scope, the following 2-level example (and other k-level examples) will be possible. I'm not clear on how the explicit parameters syntax would work here without a lot of syntax. e.g.
interface foo {
*: bar
}
interface bar {
*: func() -> u32
}
default world wildcards {
import imports: self.foo
export exports: self.foo
}
I think we should treat nested interfaces seriously in this initial version, because they are prototypical of the kind of nesting that will arise from templated parameters, fields, etc.
For nested interfaces templates and other future kinds of nesting, I'm not sure how some of the presented TOML syntaxes will work and frankly TOML isn't very good at arbitrary nesting with its constraints on e.g. inline tables. I'm interested in experimenting with what a nested template instantiation configuration might look like and considering using e.g. JSON or KDL.
As a random idea: if we want to distinguish between interfaces/worlds and their templates, what about labeling them with keywords interface*
and world*
in their definitions instead of interface
and world
.
@Kylebrown9 AFAICT k-level support of interfaces is not supported anywhere in the CM today (except maybe the binary format?); It's not currently possible to parse/resolve or bindgen the WIT you mentioned above (see WIT.md). I'm weary of using this proposal to motivate nested interfaces (if that's what you are proposing) because i don't see the connection. However, if nested interfaces were a thing actively planned we would absolutely reflect that in the substitutions DSL.
Having said that, there is still plenty of work to-be-done in planning and designing how substitutions integrate into langauge toolchains (e.g. package.json
, Cargo.toml
, etc...). The toml
used here is purely demonstrative so as to not distract from discussing the semantics of templated interfaces and how they are reflected in the CM.
If I am misunderstanding the k-level support piece above, can you link me to any resources discussing this that I may have missed? Thanks!
That's my mistake, I didn't realize k-level interfaces were still purely hypothetical.
@lukewagner can you weigh in on the direction of interface nesting?
Other than the commented-out bit about the
"*"
export, I think the above Wit is valid because the interface template isn't directly used by any subsequent types; is that right?
Right -- wasm-tools component wit
splits the interface into multiple instances when generating a component, which is useful in this case because we can use that opportunity to insert our wildcard substitutions. The result validates just fine (given the current scope of validation), but there's no way to convert it back to WIT since it has three completely different definitions of the foo
interface:
I would contend that such a thing shouldn't validate since we're using the same interface URI to mean three separate things, which makes no sense given the semantics we've ascribed to these URIs.
I think we could resolve this issue with two additions to the CM and WIT:
concrete.wit
above.
With those two things in place, we could round-trip WIT templates (and their monomorphizations) to components and back in an unambiguous way, which I believe is necessary for any implementable subset of the design.
Good points, and sorry for the slow reply. I agree that we need to allow some way to represent wildcards in the CM to enable roundtripping wit->wat->wit. I think there's two related-but different cases to tease apart though:
In the case of 1, what we want to serialize is the wildcards, with no substitutions applied. For now, just using the illegal kebab-name *
seems like a fine idea for now. And then I think this case roundtrips fine?
In the case of 2, I think it's not necessary for us to be able to reproduce the original Wit templates: in general, compiling a component targeting a world will lose parts of the original world (e.g., unused imports). Rather, the property I think we want is that, if render the component's type as Wit, I get a world that "matches" the pattern of the original Wit templates. But that still does raise the question you're getting of what to do when a single interface template's id
is used multiple times in a single world.
Here's an idea. So let's say I start with this template:
// w.wit
interface i {
*: func()
}
default world w {
import in: self.i
export out: self.i
}
and then monomorphize in
with ["a", "b"]
and out
with ["x","y"]
, I should get this component:
(component
(import "in" "pkg:/w/w" (instance
(export "a" (func))
(export "b" (func))
))
(export "out" "pkg:/w/w" (instance
(export "x" (func))
(export "y" (func))
))
)
Now, if I naively tried to create a single out-of-line interface w { }
, I'll have two conflicting definitions of its contents. But one thing I've been thinking for a while is that we should be able to explicitly stick id
s onto interface definitions, anywhere they can occur, including inline interface definitions. So that would allow me to render this world:
default world w2 {
in: interface "pkg:/w/w" {
a: func()
b: func()
}
out: interface "pkg:/w/w" {
x: func()
y: func()
}
}
which intuitively "matches" the w
template. Even ignoring Wit templates, nothing about component model validation forces all types with the same id
to be type-compatible, so this inline form also gives us an escape hatch to render component types with incompatible interface types (rather than just failing and saying "read the wat"). So it seems like a good general Wit extension which we could then leverage for Wit templates which are like a case where we want to do this on purpose. WDYT?
@lukewagner & @dicej that looks great! I want to suggest a use case that we have but the current design doesn't cover. In our use case, the guest consumes CRUD operations from records in the WIT file, and the host dynamically learns the different records used and provides the CRUD operations based on that. It may seems like generics, but it's templates. For example, I will be able to call the CRUD operations when I implement the run func in the example interface.
// sdk.wit
interface crud {
create: func(obj: *)
read: func(id: string) -> *
update: func(id: string, obj: *)
delete: func(id: string) -> *
}
// my-component.wit
interface example {
record my-record {
field: string,
}
run: func(obj: my-record)
}
default world w {
export example: self.example
import crud: pkg.sdk.crud
}
For example, it will be similar to writing:
interface crud-my-record {
use self.example.{my-record}
create: func(obj: my-record)
read: func(id: string) -> my-record
update: func(id: string, obj: my-record)
delete: func(id: string) -> my-record
}
interface example {
record my-record {
field: string,
}
run: func(obj: my-record)
}
default world w {
export example: self.example
import crud-my-record: self.crud-my-record
}
@omersadika I agree that's a great use case for templates. We've also talked about this sort of idea in previous WASI meetings, with @Kylebrown9 presenting a sketch of this in a DB UDF context. I think extending Wit templates to types would make sense as a second step after what's proposed above, since it's rather more involved, but I think we should definitely do it.
I went ahead and created draft PRs containing a minimum viable implementation: https://github.com/bytecodealliance/wasm-tools/pull/964 https://github.com/bytecodealliance/wit-bindgen/pull/541 https://github.com/bytecodealliance/wasmtime/pull/5934
I like @lukewagner 's idea for specifying ids for otherwise-anonymous interfaces. For now, though, our implementation only clones the wildcard interface as an anonymous interface prior to expanding the wildcard, which is enough to get an end-to-end test working.
Does this mean reading some information from an additional toml file? I find this inconvenient.
Is there a solution that is compatible with inline packages(https://github.com/WebAssembly/component-model/issues/313), where one file contains all the necessary information?
No, the toml file was just one option for specific use cases where it lines up with what you want to say. But there are lots of other producer toolchain options or WIT features that could be used to fill in the *
's (and we didn't get far enough along to begin to explore them all). But as a base case, you can always write a concrete (non-templated) world
in WIT by hand that matches (is a subtype of) a templated world, filling in the *
's manually -- everything else would just be sugar for synthesizing this concrete world from something more intuitive (as other examples: the specifier strings of JS import
statements or a Rust proc-macro).
Motivation
Let's say I'd like to build a component that consumes 3 configuration values
a
,b
andc
(which in a 12-factor app I'd take as 3 environment variables). I could define a component with type:However, this loses the fact that my component specifically wants
a
,b
andc
. If I'm using or deploying this component, I have to learn about these values from the docs or observe the behavior ofget-config
at runtime. If I make a mistake, the error will only show up at runtime.If instead I give my component this type:
then there's a lot more declarative information that a host can leverage to provide a better developer experience. E.g., at deployment time, the host can check that there are indeed configuration values
a
,b
andc
available and give a deployment-time error if not. There are also new optimizations made available at runtime:a
,b
andc
can be held in a dense array accessed by internally-computed static index, thereby avoiding the hash-table lookups at runtime (which can really matter when the target instantiation-time is in microseconds).This example is in the domain of configuration, but you can also find analogous examples in:
(func (param "increment" u32))
import per metric)wasi:http/outgoing-handler
per upstream.In all these cases, the workaround for supporting multiple instances of the same interface inevitably involves using a dynamic
string
parameter which loses the otherwise-declarative dependency information. (To be clear, some use cases do really need a runtime-dynamicstring
name; we're talking here about the cases where the string would otherwise be a magic constant in the code.)So given that we'd like to write components with the above type, how do we capture all these varying types in Wit? Of course we can write the Wit for any single component; for example, a Wit
world
that supports the above component is:The challenge is writing a single
world
that captures this component and all the other components, each with their own varying set of configuration values.The proposal is to allow Wit to express not just a single
interface
orworld
, but a family ofinterface
s/world
s produced by substituting various parameters. Concretely, for the above I'd like to write (roughly; the exact syntax here is open for debate, of course):Using this, it would be natural for WASI standards to use
*
in standardized interfaces such as:which would then allow
my-world
to be rewritten as:and leverage standard implementations of
wasi:config/values
.Sketch
While in general I think we'll want to allow putting parameters everywhere in Wit (types, names, parameters, results, fields, cases, imports, exports, ...), as a first step, I'd like to keep things scoped to just the case I showed above where a
*
can show up in the name of the lone field of aninterface
. While this won't be easy (there are a number of interesting producer/consumer design questions to work out), I think it'll be much easier than parameters in types, and so a good starting point. For terminology, I've been calling the general feature "Wit templates", and I think this first milestone could be called "variable imports and exports", but open to hearing alternatives.From a Wit grammar perspective, the addition is fairly tiny, defining a:
and then using
variable-id
infunc-item
andtypedef-item
. As an additional validation-time constraint,*
would only be allowed inside aninterface
block as the only item. (Or we could capture this constraint in the grammar; but I think we'll want to loosen this constraint over time.)To allow encoding Wit documents as
.wasm
binaries, we'll also need to extend Binary.md to support*
in names. One thing to be clear about here is that a concrete component won't be allowed to have any*
s in its imports/exports; only Wit documents. In a far-post-MVP future, one could imagine generalizing components with a staged-compilation model that did allow components to talk about*
names, so while we don't need to design how that all would work, it would be good to pick an encoding of*
that could be retconned into this far-post-MVP future if needed.Significant design work will be needed per-language-toolchain around how to allow the developer to fill in the
*
s in the world they are targeting. Riffing on an idea from @dicej and @fibonacci1729, this could come from buildconfig, with each line adding a field. E.g., in the context of cargo component, I could write:which would declare 3 imported configuration values
a
,b
andc
, looking ahead to a future milestone where non-string
types could be allowed. More design work is probably necessary here to think through how to express all the not-so-simple cases. But the idea is that, from this language-specific buildconfig, tooling could derive a language-agnostic substitution which would then be applied to the target world to produce a "monomorphized" world with no*
s that is fed into bindings generation, keeping the rest of the build pipeline working like normal.This is just a sketch, and more work is needed to flesh out the design, but I thought it'd be useful to put up this much now since the interface design questions above are coming up in a number of places at the moment. @dicej and @fibonacci1729 have also done a lot of thinking about this, so I'd invite them to drop in whatever they're thinking too.