WebAssembly / component-model

Repository for design and specification of the Component Model
Other
897 stars 75 forks source link

Sketch: allow a variable number of imports/exports of fixed type in Wit #172

Open lukewagner opened 1 year ago

lukewagner commented 1 year ago

Motivation

Let's say I'd like to build a component that consumes 3 configuration values a, b and c (which in a 12-factor app I'd take as 3 environment variables). I could define a component with type:

(component
  (import "get-config" (func (param "key" string) (result (option string))))
  ...
)

However, this loses the fact that my component specifically wants a, b and c. If I'm using or deploying this component, I have to learn about these values from the docs or observe the behavior of get-config at runtime. If I make a mistake, the error will only show up at runtime.

If instead I give my component this type:

(component
  (import "config" (instance
    (export "a" (func (result string)))
    (export "b" (func (result string)))
    (export "c" (func (result string)))
  ))
  ...
)

then there's a lot more declarative information that a host can leverage to provide a better developer experience. E.g., at deployment time, the host can check that there are indeed configuration values a, b and c available and give a deployment-time error if not. There are also new optimizations made available at runtime: a, b and c can be held in a dense array accessed by internally-computed static index, thereby avoiding the hash-table lookups at runtime (which can really matter when the target instantiation-time is in microseconds).

This example is in the domain of configuration, but you can also find analogous examples in:

In all these cases, the workaround for supporting multiple instances of the same interface inevitably involves using a dynamic string parameter which loses the otherwise-declarative dependency information. (To be clear, some use cases do really need a runtime-dynamic string name; we're talking here about the cases where the string would otherwise be a magic constant in the code.)

So given that we'd like to write components with the above type, how do we capture all these varying types in Wit? Of course we can write the Wit for any single component; for example, a Wit world that supports the above component is:

world my-world {
  import config: interface {
    a: func() -> string
    b: func() -> string
    c: func() -> string
  }
  ...
}

The challenge is writing a single world that captures this component and all the other components, each with their own varying set of configuration values.

The proposal is to allow Wit to express not just a single interface or world, but a family of interfaces/worlds produced by substituting various parameters. Concretely, for the above I'd like to write (roughly; the exact syntax here is open for debate, of course):

world my-world {
  import config: interface {
   *: func() -> string
  }
  ...
}

Using this, it would be natural for WASI standards to use * in standardized interfaces such as:

default interface "wasi:config/values" {
  *: func() -> string
}

which would then allow my-world to be rewritten as:

world my-world {
  import config: "wasi:config/values"
  ...
}

and leverage standard implementations of wasi:config/values.

Sketch

While in general I think we'll want to allow putting parameters everywhere in Wit (types, names, parameters, results, fields, cases, imports, exports, ...), as a first step, I'd like to keep things scoped to just the case I showed above where a * can show up in the name of the lone field of an interface. While this won't be easy (there are a number of interesting producer/consumer design questions to work out), I think it'll be much easier than parameters in types, and so a good starting point. For terminology, I've been calling the general feature "Wit templates", and I think this first milestone could be called "variable imports and exports", but open to hearing alternatives.

From a Wit grammar perspective, the addition is fairly tiny, defining a:

variable-id ::= id | '*'

and then using variable-id in func-item and typedef-item. As an additional validation-time constraint, * would only be allowed inside an interface block as the only item. (Or we could capture this constraint in the grammar; but I think we'll want to loosen this constraint over time.)

To allow encoding Wit documents as .wasm binaries, we'll also need to extend Binary.md to support * in names. One thing to be clear about here is that a concrete component won't be allowed to have any *s in its imports/exports; only Wit documents. In a far-post-MVP future, one could imagine generalizing components with a staged-compilation model that did allow components to talk about * names, so while we don't need to design how that all would work, it would be good to pick an encoding of * that could be retconned into this far-post-MVP future if needed.

Significant design work will be needed per-language-toolchain around how to allow the developer to fill in the *s in the world they are targeting. Riffing on an idea from @dicej and @fibonacci1729, this could come from buildconfig, with each line adding a field. E.g., in the context of cargo component, I could write:

[package.metadata.component.target]
path = "my-world.wit"

[package.metadata.component.parameters]
config.a = "string"
config.b = "string"
config.c = "string"

which would declare 3 imported configuration values a, b and c, looking ahead to a future milestone where non-string types could be allowed. More design work is probably necessary here to think through how to express all the not-so-simple cases. But the idea is that, from this language-specific buildconfig, tooling could derive a language-agnostic substitution which would then be applied to the target world to produce a "monomorphized" world with no *s that is fed into bindings generation, keeping the rest of the build pipeline working like normal.

This is just a sketch, and more work is needed to flesh out the design, but I thought it'd be useful to put up this much now since the interface design questions above are coming up in a number of places at the moment. @dicej and @fibonacci1729 have also done a lot of thinking about this, so I'd invite them to drop in whatever they're thinking too.

Mossaka commented 1 year ago

Nice write-up! The cargo component example really makes sense and makes it easy for developers to express configs. I have a few questions regarding the later part of the write-up in this proposal:

  1. Is the scope of star imports restricted to functions without any parameters and single string return type, like *: func() -> string? If not, I'd be interested to understand what *: func(param1, param2, ...) -> (ret1, ret2) would mean for a configuration.
  2. The example in cargo component is really nice. I'd be interested to see how other languages would express this in their buildconfig, especially some dyanmic languages like Python and JavaScript.
  3. Following up on question 2, I am not entirely sure about what a buildconfig even is. What is a buildconfig for Python or JavaScript? Could a buildconfig be implemented as a language-independent configuration file using YAML. Could it be a ConfigMap in Kubernetes?
fibonacci1729 commented 1 year ago

This looks great @lukewagner , thanks for capturing all of this!

+1 to not encoding the rule * would only be allowed inside an interface block as the only item directly into the grammar. Plenty of languages allow more expressiveness per their grammar while enforcing particular constraints during parse.

I might not be making the connection here, but how is the RHS string type of the parameter in Cargo.toml used during the substitution, for instance when substituting in the above *: func()->string?

lukewagner commented 1 year ago

Ah, I think my Cargo.toml example was confusing. Because wasi:config/values as-written has a fixed string result type, there is no right-hand-side necessary. Since TOML doesn't allow empty right-hand-side, maybe I should have written this instead:

[package.metadata.component.target]
path = "my-world.wit"

[package.metadata.component.parameters]
config = [ "a", "b", "c" ]

What I wrote would make sense once type parameters were possible and wasi:config/values was generalized interface { *: func() -> _ }), so I sortof jumped straight there in my example, where I'm substituting string for the _.

I think that answers @fibonacci1729 and partially @Mossaka's question 1. Answering the rest:

  1. The * should be usable with any function type (the "string" in the Cargo.toml was a red herring).

  2. For JS, I think you'd write the analogous thing but in package.json. I can't speak to the Python though.

  3. Yes, great point! I think we could definitely have a language-agnostic TOML (or JSON or ...) file that was fed into wit-bindgen. Maybe this is even our starting point, so it sortof sets up a "reference" format, and then Cargo.toml and package.json are just developer conveniences that avoid one extra config file.

dicej commented 1 year ago

FYI, @fibonacci1729 and I have this pretty much implemented; we'll demo it at tomorrow's Component Model meeting, then clean things up and start opening PRs. I'm sure @alexcrichton will have opinions :)

https://github.com/bytecodealliance/wasm-tools/compare/main...fermyon:wasm-tools:wit-templates https://github.com/bytecodealliance/wit-bindgen/compare/main...dicej:wit-templates https://github.com/bytecodealliance/wasmtime/compare/main...dicej:wasmtime:wit-templates

alexcrichton commented 1 year ago

I unfortunately won't be able to make the meeting tomorrow, but I look forward to digging into these bits on Monday!

dicej commented 1 year ago

First draft PR for feedback: https://github.com/bytecodealliance/wasmtime/pull/5925 I anticipate that one will generate the most discussion, since there are a variety of ways host bindings could be generated, and I picked the one that seemed most flexible to me. @Kylebrown9 FYI

dicej commented 1 year ago

@lukewagner Brian and I ran into what seems to be a pretty fundamental issue while implementing this.

Consider this world:

interface foo {
    *: func() -> u32
}

default world wildcards {
    import imports1: self.foo
    import imports2: self.foo
}

Now imagine we want to build a component that targets that world, using ["a", "b", "c"] to expand imports1's wildcard and ["x", "y", "z"] to expand imports2's wildcard. AFAICT there's currently no way to represent that component type, whether as WIT or in binary -- we can say foo has functions a, b, and c or we can say it has functions x, y, and z, but we can't say both at the same time.

Of course, we could split foo into two separate interfaces, but how would we name them, and how would you do a subtype comparison between the original WIT template and the concretized WIT?

dicej commented 1 year ago

I just realized my example above is rejected by wasm-tools anyway since you can't import the same interface more than once (I'd be curious to know the reason for that, BTW; EDIT: Brian pointed me to https://github.com/bytecodealliance/wit-bindgen/issues/529 for context). It does accept importing and exporting the same interface, though, so it's still a problem in that scenario.

Also, perhaps I was incorrect in claiming there's no way to represent the component type of the "concretized" world, given that wasm-tools component wit -t wildcards.wit currently duplicates interface types when generating a component type, so you could imagine a substitutions-aware version of wasm-tools doing something like this:

input wildcards.wit:

interface foo {
    *: func() -> u32
}

default world wildcards {
    import imports: self.foo
    export exports: self.foo
}

input substitutions.toml:

[wildcards]
imports = ["a", "b", "c"]
exports = ["x", "y", "z"]

output concrete.wat:

(component
  (type (;0;)
    (component
      (type (;0;)
        (instance
          (type (;0;) (func (result u32)))
          ;; TODO: this part won't validate -- still need proper representation for wildcards:
          ;; (export (;0;) "*" (func (type 0))) 
        )
      )
      (export (;0;) "foo" "pkg:/wildcards/foo" (instance (type 0)))
      (type (;1;)
        (component
          (type (;0;)
            (instance
              (type (;0;) (func (result u32)))
              (export (;0;) "a" (func (type 0)))
              (export (;0;) "b" (func (type 0)))
              (export (;0;) "c" (func (type 0)))
            )
          )
          (import "imports" "pkg:/wildcards/foo" (instance (type 0)))
          (type (;1;)
            (instance
              (type (;0;) (func (result u32)))
              (export (;0;) "x" (func (type 0)))
              (export (;0;) "y" (func (type 0)))
              (export (;0;) "z" (func (type 0)))
            )
          )
          (export (;0;) "exports" "pkg:/wildcards/foo" (instance (type 1)))
        )
      )
      (export (;0;) "wildcards" "pkg:/wildcards/wildcards" (component (type 1)))
    )
  )
  (export (;1;) "wildcards" "pkg:/wildcards" (type 0))
)

The above component validates just fine, but there's no way we could convert it back into WIT, since we've given an inconsistent definition of pkg:/wildcards/foo. So I guess you could argue this is just a WIT limitation, but even though the above component validates, I'd still consider it malformed given the inconsistency. In the end, I think we'd want to change both the component model and WIT if we want to support instantiating the same interface more than once in a given world.

So where does that leave us? I see two ways forward:

dicej commented 1 year ago

Assuming we do want to modify WIT and the CM to enable multiple instantiation of wildcard interfaces, I think it might be helpful to compare it to the existing "genericity" in WIT/CM: list, result, option, and tuple. Note that none of those things are types -- they're type constructors, i.e. they are "functions" take a type (or more than one in the cases of result and tuple) and produce a type. In Haskell parlance, list has kind * -> *, while u8 has kind *.

Analogously, I would suggest that an interface containing a wildcard is not an interface at all -- it's an interface constructor, i.e. a "function" that takes a list of names and produces an interface. And just as you can't say type foo = list, I'd contend that you shouldn't be able to say import imports: self.foo if foo has a wildcard because you can't instantiate an interface constructor -- you need to monomorphize it first. And I think the key to allowing multiple instantiation of wildcard interfaces is to make an explicit distinction between interfaces and interface constructors in both WIT and the CM.

We can bikeshed the syntax, but for the moment let's imagine we used a similar syntax for interface constructors (and world constructors!) to what we use for type constructors:

// Polymorphic interface constructor:
interface foo<W...> {
    <W...>: func() -> u32 // expand W into zero or more functions of this type
}

// Monomorphic world: needs no substitutions
world monomorphic {
    import imports: self.foo<a, b, c>
    export exports: self.foo<x, y, z>
}

// Polymorphic world constructor; substitutions needed to monomophize:
world polymorphic<W1..., W2...> {
    import imports: self.foo<W1>
    import imports: self.foo<W2>
}

The main virtue of this syntax is that you can see at a glance which things are worlds or interfaces (monomorphic) vs. which ones are world or interface constructors (polymorphic). And crucially for our purposes, we can invoke a given interface or world constructor multiple times without ambiguity.

To be clear, I'm not necessarily advocating for that specific syntax -- just that we make the distinction between interfaces/worlds and interface/world constructors clear and unambiguous in both WIT and the CM, so we can round-trip between them losslessly -- both before and after monomorphization. This will be even more important when we introduce type holes (which are really just type parameters to interface/world constructors).

lukewagner commented 1 year ago

Nice job digging into this. I think you're right that foo in your example isn't simply an interface. I'm not sure whether it's technically a "type constructor" or a row-polymorphic type, but in any case, we could say that it's an "interface template" that can be instantiated multiple times. I'll need to think some more about how this should look as a component type in Wat. However, I don't necessarily think we need to make the Wit format any more verbose than in the original example; it seems like these type parameters should be implicit (either always or at least by default).

Other than the commented-out bit about the "*" export, I think the above Wit is valid because the interface template isn't directly used by any subsequent types; is that right?

esoterra commented 1 year ago

I generally agree with the discussion here and that these interface templates are not themselves interfaces, but I don't know that that means they need a different syntax then what was initially proposed. I think trying to enumerate all of the parameters used in templates at the beginning of the declaration will become quite verbose.

If the main concern is going backwards to wit, I think the simplest answer (thought this may break/test other assumptions/constraints) would be to generate a different concrete interface definition that references the original template for each unique parameterization.

In our current scope, the following 2-level example (and other k-level examples) will be possible. I'm not clear on how the explicit parameters syntax would work here without a lot of syntax. e.g.

interface foo {
    *: bar
}

interface bar {
   *: func() -> u32
}

default world wildcards {
    import imports: self.foo
    export exports: self.foo
}

I think we should treat nested interfaces seriously in this initial version, because they are prototypical of the kind of nesting that will arise from templated parameters, fields, etc.

For nested interfaces templates and other future kinds of nesting, I'm not sure how some of the presented TOML syntaxes will work and frankly TOML isn't very good at arbitrary nesting with its constraints on e.g. inline tables. I'm interested in experimenting with what a nested template instantiation configuration might look like and considering using e.g. JSON or KDL.

As a random idea: if we want to distinguish between interfaces/worlds and their templates, what about labeling them with keywords interface* and world* in their definitions instead of interface and world.

fibonacci1729 commented 1 year ago

@Kylebrown9 AFAICT k-level support of interfaces is not supported anywhere in the CM today (except maybe the binary format?); It's not currently possible to parse/resolve or bindgen the WIT you mentioned above (see WIT.md). I'm weary of using this proposal to motivate nested interfaces (if that's what you are proposing) because i don't see the connection. However, if nested interfaces were a thing actively planned we would absolutely reflect that in the substitutions DSL. Having said that, there is still plenty of work to-be-done in planning and designing how substitutions integrate into langauge toolchains (e.g. package.json, Cargo.toml, etc...). The toml used here is purely demonstrative so as to not distract from discussing the semantics of templated interfaces and how they are reflected in the CM.

If I am misunderstanding the k-level support piece above, can you link me to any resources discussing this that I may have missed? Thanks!

esoterra commented 1 year ago

That's my mistake, I didn't realize k-level interfaces were still purely hypothetical.

@lukewagner can you weigh in on the direction of interface nesting?

dicej commented 1 year ago

Other than the commented-out bit about the "*" export, I think the above Wit is valid because the interface template isn't directly used by any subsequent types; is that right?

Right -- wasm-tools component wit splits the interface into multiple instances when generating a component, which is useful in this case because we can use that opportunity to insert our wildcard substitutions. The result validates just fine (given the current scope of validation), but there's no way to convert it back to WIT since it has three completely different definitions of the foo interface:

I would contend that such a thing shouldn't validate since we're using the same interface URI to mean three separate things, which makes no sense given the semantics we've ascribed to these URIs.

I think we could resolve this issue with two additions to the CM and WIT:

With those two things in place, we could round-trip WIT templates (and their monomorphizations) to components and back in an unambiguous way, which I believe is necessary for any implementable subset of the design.

lukewagner commented 1 year ago

Good points, and sorry for the slow reply. I agree that we need to allow some way to represent wildcards in the CM to enable roundtripping wit->wat->wit. I think there's two related-but different cases to tease apart though:

  1. The serialization of a wit package into wat which we later want to see as the original wit again.
  2. The compilation of a component targeting a monomorphized world where we later want to see the component's type as wit.

In the case of 1, what we want to serialize is the wildcards, with no substitutions applied. For now, just using the illegal kebab-name * seems like a fine idea for now. And then I think this case roundtrips fine?

In the case of 2, I think it's not necessary for us to be able to reproduce the original Wit templates: in general, compiling a component targeting a world will lose parts of the original world (e.g., unused imports). Rather, the property I think we want is that, if render the component's type as Wit, I get a world that "matches" the pattern of the original Wit templates. But that still does raise the question you're getting of what to do when a single interface template's id is used multiple times in a single world.

Here's an idea. So let's say I start with this template:

// w.wit
interface i {
  *: func()
}
default world w {
  import in: self.i
  export out: self.i
}

and then monomorphize in with ["a", "b"] and out with ["x","y"], I should get this component:

(component
  (import "in" "pkg:/w/w" (instance
    (export "a" (func))
    (export "b" (func))
  ))
  (export "out" "pkg:/w/w" (instance
    (export "x" (func))
    (export "y" (func))
  ))
)

Now, if I naively tried to create a single out-of-line interface w { }, I'll have two conflicting definitions of its contents. But one thing I've been thinking for a while is that we should be able to explicitly stick ids onto interface definitions, anywhere they can occur, including inline interface definitions. So that would allow me to render this world:

default world w2 {
  in: interface "pkg:/w/w" {
    a: func()
    b: func()
  }
  out: interface "pkg:/w/w" {
    x: func()
    y: func()
  }
}

which intuitively "matches" the w template. Even ignoring Wit templates, nothing about component model validation forces all types with the same id to be type-compatible, so this inline form also gives us an escape hatch to render component types with incompatible interface types (rather than just failing and saying "read the wat"). So it seems like a good general Wit extension which we could then leverage for Wit templates which are like a case where we want to do this on purpose. WDYT?

omersadika commented 1 year ago

@lukewagner & @dicej that looks great! I want to suggest a use case that we have but the current design doesn't cover. In our use case, the guest consumes CRUD operations from records in the WIT file, and the host dynamically learns the different records used and provides the CRUD operations based on that. It may seems like generics, but it's templates. For example, I will be able to call the CRUD operations when I implement the run func in the example interface.

// sdk.wit
interface crud {
  create: func(obj: *)
  read: func(id: string) -> *
  update: func(id: string, obj: *)
  delete: func(id: string) -> *
}

// my-component.wit
interface example {
  record my-record {
    field: string,
  }
  run: func(obj: my-record)
}

default world w {
  export example: self.example
  import crud: pkg.sdk.crud
}

For example, it will be similar to writing:

interface crud-my-record {
  use self.example.{my-record}
  create: func(obj: my-record)
  read: func(id: string) -> my-record
  update: func(id: string, obj: my-record)
  delete: func(id: string) -> my-record
}

interface example {
  record my-record {
    field: string,
  }
  run: func(obj: my-record)
}

default world w {
  export example: self.example
  import crud-my-record: self.crud-my-record
}
lukewagner commented 1 year ago

@omersadika I agree that's a great use case for templates. We've also talked about this sort of idea in previous WASI meetings, with @Kylebrown9 presenting a sketch of this in a DB UDF context. I think extending Wit templates to types would make sense as a second step after what's proposed above, since it's rather more involved, but I think we should definitely do it.

dicej commented 1 year ago

I went ahead and created draft PRs containing a minimum viable implementation: https://github.com/bytecodealliance/wasm-tools/pull/964 https://github.com/bytecodealliance/wit-bindgen/pull/541 https://github.com/bytecodealliance/wasmtime/pull/5934

I like @lukewagner 's idea for specifying ids for otherwise-anonymous interfaces. For now, though, our implementation only clones the wildcard interface as an anonymous interface prior to expanding the wildcard, which is enough to get an end-to-end test working.

oovm commented 1 month ago

Does this mean reading some information from an additional toml file? I find this inconvenient.

Is there a solution that is compatible with inline packages(https://github.com/WebAssembly/component-model/issues/313), where one file contains all the necessary information?

lukewagner commented 1 month ago

No, the toml file was just one option for specific use cases where it lines up with what you want to say. But there are lots of other producer toolchain options or WIT features that could be used to fill in the *'s (and we didn't get far enough along to begin to explore them all). But as a base case, you can always write a concrete (non-templated) world in WIT by hand that matches (is a subtype of) a templated world, filling in the *'s manually -- everything else would just be sugar for synthesizing this concrete world from something more intuitive (as other examples: the specifier strings of JS import statements or a Rust proc-macro).