bytecodealliance / cargo-component

A Cargo subcommand for creating WebAssembly components based on the component model proposal.
Apache License 2.0
462 stars 54 forks source link

[Design] Component Registry Dependencies #42

Closed peterhuene closed 1 year ago

peterhuene commented 1 year ago

Component Registry Dependencies

Overview

Currently cargo-component supports implementing a component by specifying individual imports and exports defined by local wit documents via Cargo.toml.

At the time cargo-component was originally implemented, it was imagined that a component registry might store individual interface definitions as packages, enabling registry dependencies to be specified in a Cargo.toml file as:

[package.metadata.component.imports] 
pkg1 = { version = "1.2.3", package = "org/pkg1" }

[package.metadata.component.exports] 
pkg2 = { version = "1.2.3", package = "org/pkg2" }

Therefore at most one interface could be defined in a wit document and stored in a component registry "interface" package.

As a consequence of this, a world stored in a registry would then need to explicitly reference each interface being imported and exported from their individual interface packages; this is certainly not ergonomic and would definitely contribute to an unnecessary proliferation of packages.

wit has since evolved to allow multiple interfaces and worlds to be defined in one or more wit documents. To facilitate this, wit now has syntax for using types from other documents.

With this more flexible approach to defining interfaces, cargo-component needs a new mechanism for specifying dependencies that are stored in a component registry.

This document proposes a design for specifying dependencies from a component registry for cargo-component.

Registry package types

Before discussing the proposed design, it might be useful to discuss some terminology surrounding the types of packages that might be stored in a component registry.

Previously, there was discussion around there being three types of packages in a component registry: an interface package, a world package, and a component package.

Both interface and world packages are simply WebAssembly components
containing only type information; in terms of the component model proposal, the former describes an instance type and the latter describes a component type.

A component package is an implementation of a WebAssembly component; thus it contains type information and executable code.

With the introduction of the use syntax, this proposal suggests reducing the types of registry packages to two: a wit package and the aforementioned concept of a component package.

A wit package, much like the concepts of interface and world packages, contains only definitions of types, interfaces, and worlds. However, a wit package may store any number of definitions, including defining both interfaces and worlds in the same package.

Design overview

Note: cargo-component should support sourcing packages from multiple registries, potentially defaulting to a particular registry instance.

This design proposes that cargo-component reads only the version requirements for component registry dependencies from Cargo.toml.

An example Cargo.toml of a "greeter" component might look like:

[package.metadata.component.dependencies]
wasi = "webassembly/wasi:1.2.3"
formatter = "my-org/formatter:3.2.1"

The short form of a dependency is <name> = "<package-id>:<version>".

The general form is <name> = { version = "<version>", package = "<package-id>" } where name here maps to a package name used in component.wit (see below) and package-id is the qualified identifier of the package in a component registry.

Local wit documents may still be referenced by using a path key instead of the package key.

In this example, Cargo.toml is only specifying that version 1.2.3 of the webassembly/wasi package and version 3.2.1 of the my-org/formatter package be used.

Note that what is used from the packages is not specified in Cargo.toml; thus it doesn't describe the world of the component being built in any way.

To specify the component's world, a wit file with a default name of component.wit is used:

world my-component {
    import wasi-fs: wasi.fs
    import formatter: formatter
    greet: func(name: string)
}

In this example, the component imports two interfaces: the fs interface from a package named wasi and the default interface from the formatter package. The former is used to print the greeting to the console and the latter is used to format the message based on whatever formatter implementation is supplied at runtime.

The component will directly export a function named greet that will ultimately print a greeting for the given name.

Because cargo-component resolves dependencies ahead-of-time, wit-parser only needs to be instructed where to locate the wasi and formatter packages to successfully parse component.wit.

Implementation

To implement this approach, cargo-component will parse Cargo.toml to figure out what component registry dependencies are required.

For consistent builds, it will also consult a lock file (design TBD) for specific versions and signatures of the dependencies to use.

cargo-component will contact one or more registries to download (or update) the package logs of the dependencies, verify the logs, and resolve the version requirements to specific versions to download and cache locally.

It will then parse ./component.wit (the path can be changed in Cargo.toml, if desired) and provide wit-parser with the paths to the cached package dependencies.

From this, the definition of a world will be derived and used to generate the bindings needed to build the component for commands like cargo component build.

Finally, the resulting component will encode unique urls for its imports and exports based on what packages were resolved (and from where) by cargo-component.

Benefits

A few benefits to this approach:

Tooling for other languages

In addition to Rust, this approach can also be adopted for other language-specific tooling.

For example, in JavaScript, tooling that wraps npm could specify dependency version requirements in package.json and use a component.wit file to specify the component type for generating bindings.

Even wit-bindgen CLI (or a tool wrapping it) could be made registry-aware by sourcing version requirements from a file (bindgen.toml?) and use a component.wit file to generate bindings for supported languages.

peterhuene commented 1 year ago

I've opened this issue to both document and discuss how best to integrate cargo-component with a component registry, which I've already begun a refactoring to implement.

I'm open to any feedback regarding this design, especially feedback that improves the DX of cargo-component for Rust developers implementing WebAssembly components (this is my primary concern).

After a little bike-shedding in this issue, I'll turn this into a PR for proper review.

ricochet commented 1 year ago

It might be worthwhile to update or refer to the glossary over in https://github.com/bytecodealliance/SIG-Registries/blob/main/glossary.md

peterhuene commented 1 year ago

Agreed! I think it'd be best to bike shed here a little before pushing a glossary update upstream (if necessary), but I will definitely do so.

ricochet commented 1 year ago

There's certainly plenty to dig in on. Nice write-up! The addition of component type lines up extremely well with the proposed world and interface types.

Could wit-bindgen take a component and generate bindings only from that component? I think this would essentially require versions to be embedded in the component. If that were the case, would this change your design?

ricochet commented 1 year ago

Something we can do with components but very few other languages can do, is to (potentially) have the ability to supply multiple versions of a dependency. It might be worth sketching out how we think that will work here (but not necessarily implement it for the MVP).

peterhuene commented 1 year ago

Could wit-bindgen take a component and generate bindings only from that component? I think this would essentially require versions to be embedded in the component. If that were the case, would this change your design?

I'm not entirely sure I understand; if you mean when implementing component B and you depend on component A, then yes, it should generate bindings directly from component A's type information for building component B.

A may have its imports and exports referencing other packages in the registry (or another one) via their url fields (these are versioned in some fashion; additionally, I believe Luke recently proposed specifying version information in the imports/exports themselves), so it should be possible for bindings generation to be transitive without having to specify dependencies of dependencies.

Does that make sense or did I completely misunderstand your question?

alexcrichton commented 1 year ago

One part I might bikeshed a bit is the usage of component as that's what a world is intended to be used for, and I think could work in this situation? For example I'm envisioning something like:

[package.metadata.component]
implements = "fastly.compute-at-edge"

[package.metadata.component.dependencies]
fastly = "0.1"
wasi = "1.0"

where the implements directive points to a world, in this case the package fastly would have a document compute-at-edge.wit which would have a default world something-or-other { ... } which would be used to implement this component. The fastly package would be consulted relative to dependencies. Similarly a local version could be done with:

[package.metadata.component]
implements = "my-package.my-world"

[package.metadata.component.dependencies]
fastly = "1.0"
my-package = { path = "wit" }

where you would then write:

// wit/my-world.wit

default world the-world-i-am-implementing {
    include fastly.compute-at-edge // hypothetical syntax not currently in the WIT spec yet
    // ...
}

This would allow avoiding component.wit entirely if the world is defined on the registry (which many likely will be) and additionally allows doing your own local thing if you'd like too

peterhuene commented 1 year ago

I like that as it allows for pointing at someone else's world and using it as is without having to author a wit file at all.

cargo component new would probably still spit out a wit file to define the component's world to implement by default, but I imagine adding an --implements option to point it at a world from the registry.

And if they want to define a custom world based on other worlds / interfaces, they'd use the hybrid approach in your example.

I'll update the design shortly.

peterhuene commented 1 year ago

I'll add that we probably can't support the foo = "1.2.3" syntax for dependencies (à la crates.io) as it's intended for the packages to be namespaced in a component registry.

Perhaps a shorthand of <name> = "<package-id>:<version>" which is semantically equivalent to <name> = { version = "<version>", package = "<package-id>" }?

Edit: updated the design to use the shorthand form.

peterhuene commented 1 year ago

So I was thinking about this more when walking my dog. The problem with not having a wit file in some cases is that I would imagine that nearly all of the time users will, in fact, need one.

Say for example I'm authoring a component targeting a particular world (e.g. fastly.compute-at-edge). This is fine as it informs the bindings for what the component can import and what it must export and, if the component being implemented is self-contained and doesn't have any additional exports, that's all it would need.

But now I, as the component author, want to make use of another component (which hopefully in a thriving component ecosystem is commonplace). To do so, I now need to know to create a wit file, point Cargo.toml at a world within it, and not forget to include the original world being implemented before adding an import for the component dependency into the world.

That seems complicated (albeit perhaps automatable by cargo component add). In whatever design we approach, I definitely want to make depending on another component as simple as possible.

Perhaps, as a possible solution, if the world being implemented isn't from a local path, any dependency on a component package translates to an import merged in with the world being implemented? Is that too magical or no?

Another thing to consider is that a local wit definition will also be desired for describing additional exports (like the greet function in my example above) from the component being implemented.

I still think there is something to the clear demarcation of version requirements go in Cargo.toml and the component type is described by wit alone, but I'm not married to it.

alexcrichton commented 1 year ago

I think though component dependencies may work out differently? I think of a world as "I can't work unless you give me this and I give you that" whereas a component dependency is lower level where it's an internal implementation detail that possibly could be bundled within the component itself (or imported for a registry scenario). In that sense would component dependencies necessarily show up in WIT files?

peterhuene commented 1 year ago

I don't view component dependencies as internal implementation details when authoring a component at all.

From the perspective of the tooling that produces component packages (e.g. cargo-component), where all imports and exports (excepting the "default" export) of interfaces from the component are in terms of instances for maximal composability, there needs to be a way to describe more imports or less exports that what an implemented world requires.

It's not the domain of a tool like cargo-component to produce a component that is a subtype of a particular world. It should be possible to produce a component package that imports (even a subset of) interfaces from a world and from a component dependency with that dependency represented as an instance import in the authored component's type.

To me, it's the domain of a composition tool to resolve the abstract (instance) imports to either:

Similarly with exports, it should be possible for a composition tool to specify what gets exported from the resulting component, allowing a target world to be satisfied by an export coming from one internal instance and another export coming from a different internal instance.

Composition is where the fun stuff happens. That's the tool that would enforce that the resulting component adheres to a target world, if used, ensuring the resulting component type is a subtype of the world.

These composed components could, of course, also be published to a registry, expressing their dependencies on other components as component imports (thereby "locking" the composition to that particular implementation).

My understanding of this might be off, but I think we would want to produce component packages from the language tooling as abstract as possible while also expressing dependencies on other components directly (via instance imports).

peterhuene commented 1 year ago

To put this all another way, I want the language-specific component tooling to be able to express:

This is mostly why I omitted "worlds" from the design above as it felt like, to me, that this particular tooling isn't as concerned with worlds other than perhaps as a shorthand for explicitly importing and exporting the interfaces specified by a world, e.g.:

component {
  include wasi.command // in theory a world describing commands
  import foo: bar
  greet: func(string)
}

vs.

component {
  import wasi-fs: wasi.fs // imported by `wasi.command`
  ...
  execute: func() // in theory whatever exports for `wasi.command`
  import foo: bar
  greet: func(string)
}

In this example, the produced component isn't a subtype of the wasi.command world and therefore would need to be composed with something that erases the foo import before executing in such a world.

lukewagner commented 1 year ago

Great writeup and great discussion!

First, just as a naming bikeshed, could we call this unified (interface+world) package a "Wit" package (instead of a "definition" package)? My reasoning here is that "definition" is used pretty broadly in core wasm and the component model and in general refers to anything that can be inserted into an index space, thus covering types, functions, instances, etc. Even "components" and "modules" are definitions, so in a sense all packages are "definition" packages.

As a second bikeshed, to be consistent with the "targets"/"supports" terminology suggested here and here, perhaps we could say "targets" instead of "implements" in Cargo.toml? In theory it's unambiguous in the context of a component-producer toolchain that when we say "implement" we mean "target", but I was thinking it might be nice to just be consistent in the use of these two terms instead of "implements".

Lastly, I agree with Alex that we would ideally stick with world instead of introducing a new component concept given that, iiuc, the two concepts are structurally identical and would be resolved the same way. (But is that right, or is there some difference I'm missing?)

But I also agree with Peter that, when I am building a reusable (unlocked) component for publication and reuse, I want to create a component with the variety of kinds of imports that Peter list and let some downstream consumer of this component figure out exact dependency versions, virtualizations, etc when building the final (locked) component I want to execute. My impression of how this works is that, when I'm authoring a component, I start with a "base" world that I'm "targeting", and then I add extra dependencies (on both interfaces and component implementations) in my Cargo.toml that get joined (⊔) with my "base" world to produce a "derived" world that maps (as defined in component-model/#141) to the final component type of my (unlocked) component.

As a side thought on the interaction between worlds and dependencies: in a normal unlocked component, dependencies on other components appear as instance imports (as Peter said) and thus we're not yet fixing what their imports look like; figuring that out is the job of a downstream depsolver tool. An interesting possibility here is that while my component A targets world WA, my dependency B may target world WB which is not a subtype of WA and thus a trivial depsolve would end up targeting WA⊔WB. If I actually want to run the composite AB on world WA, I'll need to virtualize the stuff in WB that's not in WA, but that's also the job of further downstream (virtualization) tooling. But maybe I don't want to have to virtualize, so when building A initially, I want to ask the build tool to reject any dependencies that fall outside of WA (so I'm guaranteed the depsolve will produce a composite that runs in WA). Or, maybe I want to preemptively virtualize my dependency's imports (independent of the broader depsolved component DAG), so that I'm importing the dependency as a component (not instance) and creating an instance privately. I could imagine these both as advanced options added at some point.

peterhuene commented 1 year ago

👍 on simply "wit" package, using "targets" for world terminology when authoring components, and also not using a wit syntax specifically for defining a component type when that's really what a world is.

However, it's still not clear to me that we would want to have what world is being targeted in Cargo.toml given it's likely users will want a wit description of the component's world anyway to define both its own interface(s) and explicitly specifying how its dependencies are otherwise imported and exported from the component.

It seems like always having a wit file for the component (optionally pointed at by Cargo.toml, but otherwise defaulted to a particular path) would mean having multiple places where the world dependency could be specified: in a Cargo.toml and also as a (hypothetical) include in the wit.

I'd personally like to delegate the entire world definition to wit and let Cargo.toml just define the version requirements of the dependencies.

peterhuene commented 1 year ago

After some discussion with the Registry SIG, I think we'll move forward with the ability to describe the component being authored's world in Cargo.toml with also the ability to easily extract that information out into a separate wit file referenced from Cargo.toml using the tooling (cargo component wit or some such); basically what Alex has proposed above, but perhaps with an additional mechanism for specifying additional imports and exports in TOML.

I think this will strike the right balance between good initial developer experience (i.e. doesn't make wit an initial learning hurdle to implementing a component that targets a well-defined world) and allowing those that are comfortable with wit to do more advanced descriptions of the components they're authoring.

peterhuene commented 1 year ago

Thanks everyone for feedback on this. I've put up PR #43 that I hope strikes the right balance between what belongs in Cargo.toml and when a wit document is needed.