WebAssembly / component-model

Repository for design and specification of the Component Model
Other
899 stars 75 forks source link

What's the naming/conceptual convention around WIT packages, worlds, and interfaces? Are there parallels in other languages? #295

Closed jcbhmr closed 5 months ago

jcbhmr commented 5 months ago

Here's a concrete example: If I want to turn this very simple Rust package https://docs.rs/unicode-math-class/latest/unicode_math_class/ into a WebAssembly component using https://github.com/bytecodealliance/cargo-component, what should I call the package, world, and interface?

For instance in the example Rust package, there's these levels of hierarchy:

In my brain those map relatively cleanly onto the package/world/interface model like so:

WIT snippet of that ```wit package jcbhmr:unicode-math-class; world unicode-math-class { // Can't export enum without a wrapping interface! What do you call the interface tho? export x: interface { enum math-class { ... } // No consts; compromise with getter func. revision: func() -> u8; // No char type; compromise with string. class: func(c: string) -> math-class; } } ```

Is there a better way? πŸ€”

I'm looking for the convention. Or just some general "well here's what I do" ideas. I'm having decision paralysis and I'd like to get some other opinions besides my own biased one.


To add another example to my question: what about other languages? Another ezpz to-WASM language besides Rust is JavaScript https://github.com/bytecodealliance/jco#componentize

Here's my guess as to how something like https://tsdocs.dev/docs/string-width might be paralleled to WASM component WIT:

You've roughly got these hierarchical levels:

In my brain I think that could be modeled as:

WIT snippet of that ```wit package jcbhmr:string-width; // 'import {...} from "string-width"' world string-width { use options.{options}; // ugh. now the function is free of an interface but the record type // needs one. you might as well just stick them both in the interface? export default: func(string: string, options: option); export options; } interface options { record options { ambiguous-is-narrow: bool, count-ansi-escape-codes: bool, } } ```

Is there a better way? πŸ€”

no, the string-width/emojify export doesn't actually exist


As you might be able to tell, I'm a bit puzzled about when/how to use interfaces and how to name them idiomatically when they get in the way of flatter hierarchies. I'm trying to understand the concepts that package/world/interface are supposed to represent and apply them to existing programming concepts that I know. Thanks for reading my TED talk lol

p.s. has this already been discussed somewhere/somehow? it must've...

lukewagner commented 5 months ago

Good questions! So, when packaging up an implementation (like unicode-math-class or string-width in your examples), there's ultimately only one thing you need to name, which is the package name of the component implementation (just like in npm or cargo, where every package is an implementation package). Clients of this component implementation only need to know this one package name because the client tooling can extract from the component implementation all the necessary WIT information needed to generate client bindings (component binaries embed all the type information needed). You could sortof think of it as-if the package name of a component implementation also implicitly names "the world targeted by this component".

This is all in contrast to the situation where we're defining standardized interfaces and worlds as part of WASI or as part of some platform vendor's public interface. In these scenarios, there is no component implementation (no executable code being published), and thus we want only the type information, and so we have this regular encoding of pure WIT information into a component binary (as named exported component types).

Now, with that distinction made, when you are authoring a component that you want to later publish, you may need to write some WIT to specify the world you're targeting. You don't always have to, of course: if you're targeting a standard or platform-defined world, you just name the published WIT world (in your languages' appropriate buildconfig) and you're done. Alternatively (and increasingly over time), your language toolchain may allow you to define your component's type using source-language types and the toolchain then infers the target world from those types (e.g., mapping a Rust str to a C-M string). But in general, you may of course want to explicitly author a target world in WIT to have full explicit control (or express something you can't otherwise). And in this scenario, when you're writing WIT, WIT currently makes you specify a name for the target world, but in fact this world name doesn't actually appear in the final binary -- it's forgotten as part of compilation. For this reason, I've been thinking that we should probably allow a different "mode" of WIT where you (1) don't have to declare a package name, (2) get to write a single unnamed world that is understood to be "the target world". You can't package this WIT but by definition you don't want to -- you just want to feed this WIT into your language toolchain to author a component implementation (that you do want to publish). However, until we do that (or something else to address this use case), I wouldn't care about the world name (and naming scheme) when authoring a component, since whatever world name you write is ultimately forgotten.

I hope that helps! Happy to discuss more. (Also CC @alexcrichton for other thoughts, since this is something we've talked about before.)

jcbhmr commented 5 months ago

And in this scenario, when you're writing WIT, WIT currently makes you specify a name for the target world, but in fact this world name doesn't actually appear in the final binary -- it's forgotten as part of compilation.

You're right! The world silly-world-name {} doesn't appear when generating things using jco or wasmtime-py's wasmtime.bindgen.

It does seem to appear when using Wasmtime's bindgen!() macro though so it's not completely invisible πŸ€·β€β™€οΈ This might be because it's generating bindings from WIT directly instead of from the magic info that's embedded in the my-component.wasm file like jco and wasmtime-py do.

```rs use wasmtime::component::{Component, Linker}; use wasmtime::{Config, Engine, Store}; pub mod bindings { use wasmtime::component::bindgen; bindgen!(in "../../wit"); } pub use bindings::exports::typst_community::unicode_math_class::crate_ as wit; pub const WORLD: &str = "unicode-math-class"; pub struct MyState {} pub fn instantiate() -> (bindings::UnicodeMathClass, Store) { let mut config = Config::new(); config.wasm_component_model(true); let engine = Engine::new(&config).unwrap(); let component = Component::from_file( &engine, format!("../../target/wasm32-unknown-unknown/debug/{WORLD}.wasm"), ) .unwrap(); let mut linker = Linker::new(&engine); let mut store = Store::new(&engine, MyState {}); let (bindings, _) = bindings::UnicodeMathClass::instantiate(&mut store, &component, &linker).unwrap(); // πŸ‘† the PascalCased world name 😡 (bindings, store) } ``` https://github.com/jcbhmr/unicode-math-class.wasm/blob/96deb970f9ba5e80cd271d539929e60e1c599872/tests/rs/tests/common/mod.rs

so you're right; the world name doesn't matter. the thing that multiple worlds or world names are good for in my brain is naming your build targets. point being: I think this actually does map relatively well particularly to the Rust "world <=> crate" analogy. If you have a dual lib & cli Rust package with two crates then that could be modeled well with two worlds: my-app-lib-world and my-app-bin-world.

concrete example: https://github.com/typst/hayagriva https://docs.rs/hayagriva/latest/hayagriva/ has two crates in the single package: a lib.rs and a main.rs

my idea of a wit binding:

package typst:hayagriva;

world hayagriva-silly-name-never-shown-bin-world {
  include wasi:cli/command;
}

world hayagriva-helloooooo-silly-world-library {
  // still don't know the naming convention for crate::* root things πŸ€·β€β™€οΈ
  export crate; //=> hayagriva::*
  export types; //=> hayagriva::types::*
  export io; //=> hayagriva::io::*
  export lang; //=> hayagriva::lang::*
}

then when you run make build or ./build.sh or whatever you'd expect those two worlds to map directly to two .wasm files...right? am i on the right track with that mental model of what a world is?

this is really a rant/thought-sneeze but i hope what im thinking about here makes sense and im understanding this stuff ok-ishly correctly.


alright with the world stuff out of the way...

For this reason, I've been thinking that we should probably allow a different "mode" of WIT where you (1) don't have to declare a package name, (2) get to write a single unnamed world that is understood to be "the target world". You can't package this WIT but by definition you don't want to -- you just want to feed this WIT into your language toolchain to author a component implementation (that you do want to publish).

πŸ€·β€β™‚οΈ idc. U can leave it as-is as long as it's documented somewhere lol


the "what should i call the package" not answered directly but i think i get it βœ… so the "what should i call the world" got answered βœ… what about the interface?

if I take the unicode-math-class example again, it has just three ROOT exports: REVISION const, class() func, and MathClass enum. and since you can't put enums, variants, records, resources, etc. on the root world you MUST use a wrapper interface. what's the naming convention for interfaces?

basically, which one of these is best? or is there a better way im not thinking of?

```wit package jcbhmr:unicode-math-class; interface crate { revision: func() -> u16; class: func(c: string) -> math-class; enum math-class { // ... } } world w { export crate; } ``` unicode_math_class::class => `jcbhmr:unicode-math-class/crate#class` unicode_math_class::MathClass => `jcbhmr:unicode-math-class/crate#math-class` ```wit package jcbhmr:unicode-math-class; interface types { enum math-class { // ... } } world w { use types.{math-class}; export types; export revision: func() -> u16; export class: func(c: string) -> math-class; } ``` unicode_math_class::class => `jcbhmr:unicode-math-class#class` unicode_math_class::MathClass => `jcbhmr:unicode-math-class/types#math-class`
```wit package jcbhmr:unicode-math-class; interface unicode-math-class { revision: func() -> u16; class: func(c: string) -> math-class; enum math-class { // ... } } world w { export unicode-math-class; } ``` unicode_math_class::class => `jcbhmr:unicode-math-class/unicode-math-class#class` unicode_math_class::MathClass => `jcbhmr:unicode-math-class/unicode-math-class#math-class`

there's already the jcbhmr:unicode-math-class/ prefix from the package name that will be present on any exports.

as another example with a few submodules, take https://docs.rs/hayagriva/latest/hayagriva/

which of these is best? or is there a better way im not thinking of?

```wit package jcbhmr:hayagriva; interface crate { // hayagriva::* record bibliography-driver {} enum entry {} record rendered {} resource specific-locator {} // etc. } interface lang { // hayagriva::lang::* record case-folder {} enum title-case {} resource sentence-case {} // etc. } interface types {} // hayagriva::types::* world w { export crate; export lang; export types; } ``` hayagriva::entry => `jcbhmr:hayagriva/crate#entry` hayagriva::lang::CaseFolder => `jcbhmr:hayagriva/lang#case-folder` ```wit package jcbhmr:hayagriva; interface hayagriva { // hayagriva::* record bibliography-driver {} enum entry {} record rendered {} resource specific-locator {} // etc. // hayagriva::lang::* record case-folder {} enum title-case {} resource sentence-case {} // etc. // hayagriva::types::* // etc. } world w { export hayagriva; } ``` hayagriva::entry => `jcbhmr:hayagriva/hayagriva#entry` hayagriva::lang::CaseFolder => `jcbhmr:hayagriva/hayagriva#case-folder`
```wit package jcbhmr:hayagriva; interface hayagriva { // hayagriva::* record bibliography-driver {} enum entry {} record rendered {} resource specific-locator {} // etc. } interface hayagriva-lang { // hayagriva::lang::* record case-folder {} enum title-case {} resource sentence-case {} // etc. } interface hayagriva-types { // hayagriva::types::* // etc. } world w { export hayagriva; export hayagriva-lang; export hayagriva-types; } ``` hayagriva::entry => `jcbhmr:hayagriva/hayagriva#entry` hayagriva::lang::CaseFolder => `jcbhmr:hayagriva/hayagriva-lang#case-folder`

to clarify my question im essentially asking: what concept in Rust or JavaScript does an interface map to?

right now im sorta thinking it's similar to a specific module in rust (hayagriva::types is one module, hayagriva::io is another, hayagriva is the root module, etc.) but i don't know how well that applies to javascript.

like in my https://tsdocs.dev/docs/string-width example i guess i could see an interface as each string-width/* sub export? like string-width is one interface and then string-width/some-subpath-export is another interface?

but then what do I call that ./ root interface default? the name of the package? the Options type record is going to show up as jcbhmr:string-width/some-interface-name#options. ideally it'd be on the root directly like you can with functions but alas. πŸ€·β€β™‚οΈ

Which one of these looks better? another more different one that's not here?

```wit package jcbhmr:string-width; interface string-width { default: func(s: string, options: optional) -> u32; record options { ambiguous-is-narrow: boolean, count-ansi-escape-codes: boolean, } } world w { export string-width; } ``` default export => `jcbhmr:string-width/string-width#default` Options => `jcbhmr:string-width/string-width#options` ```wit package jcbhmr:string-width; interface types { record options { ambiguous-is-narrow: boolean, count-ansi-escape-codes: boolean, } } world w { export types; use types.{options}; export default: func(s: string, options: optional) -> u32; } ``` default export => `jcbhmr:string-width#default` Options => `jcbhmr:string-width/types#options`
```wit package jcbhmr:string-width; interface root { default: func(s: string, options: optional) -> u32; record options { ambiguous-is-narrow: boolean, count-ansi-escape-codes: boolean, } } world w { export root; } ``` default export => `jcbhmr:string-width/root#default` Options => `jcbhmr:string-width/root#options`

sorry this is getting quite opinion-y and stylistic-y. im just kinda feeling around to see other peoples opinions on things cause up till now it's just been me arguing with myself about how to organize WIT stuffs and that's not constructive. (that's why the question was about conventions too lol)

lukewagner commented 5 months ago

You're right! The world silly-world-name {} doesn't appear when generating things using jco or wasmtime-py's wasmtime.bindgen.

Yes, I suppose the world name can show up as an impl detail in the bindings, although probably it shouldn't (even today). If we added the "unnamed world" feature for non-package WIT as I mentioned earlier, than that would effectively force the bindings generator to use a different scheme that didn't care about the world name.

if I take the unicode-math-class example again [...]

I think your top-right example is on the right track, i.e., having the component export everything directly without wrapping it in an interface. Components are also able to directly export types, so it is just a temporary WIT limitation that you're not able to define and export the math-class enum directly in the world. We should probably fix that to avoid this question.

In the meantime, you can also write:

world w {
  export types: interface {
    enum math-class { ... }
  }
}

as another example with a few submodules, take https://docs.rs/hayagriva/latest/hayagriva/ [...]

Here also I'd just export everything directly or, if you want to group things, use the export foo: interface { ... } syntax I mentioned above.

to clarify my question im essentially asking: what concept in Rust or JavaScript does an interface map to?

In JS, I'd say an interface corresponds to either a module namespace object or, when they are nested, a plain old JS object (that groups together and namespaces a collection of exports together). In Rust, yeah, I think a module is the closest correspondence. Really, an interface is just about namespacing a collection of related definitions so that they can use the short names they want to without worrying about conflicting with unrelated interfaces. It's up to the bindings generator to use what the language gives it to implement that namespacing.

Hope that helps!

jcbhmr commented 5 months ago

I think your top-right example is on the right track, i.e., having the component export everything directly without wrapping it in an interface.

πŸ™ omg THANK YOU for providing another opinion. Looking back (hindsight 20/20) i can see that this idea of a interface types {} is used a bunch in WASI-related stuff! i had coders-block; I had myself convinced "all the funcs also need to be in an idiomatic interface"

and that's just a cursory look lol

Components are also able to directly export types, so it is just a temporary WIT limitation that you're not able to define and export the math-class enum directly in the world. We should probably fix that to avoid this question.

I like this idea very much. I checked out Wasmer's custom .wai format recently https://wasmerio.github.io/wasmer-pack/user-docs/concepts/wai/index.html and it seems to encourage flat file->type hierarchies instead of package->world->interface->type πŸ€·β€β™‚οΈ

As a beginner WAI initially seemed easier to understand but now that I know what a package x:y@z; does and how a world the-world {} works and what an interface the-interface {} does I prefer WIT to WAI (WIT is FAR more flexible than WAI; WAI is isolated to single file-- no sharing imports/exports!)

The weird restriction (I assume due to some practical technical reason) that you can export functions directly but not enum or record or resource etc. was a bit "well ok does that mean the 'happy path' is to put everything in an interface? should i put everything in an interface? is an interface the convention?" <- that was me last week lol

Here also I'd just export everything directly or, if you want to group things, use the export foo: interface { ... } syntax I mentioned above.

πŸ‘

wasi-related: the WASI-defined stuff is very interface-heavy! should i be using more interfaces than just interface types {}? ex: should i emulate this "pull the error type into a separate interface" idea? https://github.com/WebAssembly/wasi-keyvalue/blob/main/wit/error.wit vs having it in https://github.com/WebAssembly/wasi-keyvalue/blob/main/wit/types.wit ?

this is a "hey i noticed this. whats the reason?" question which is tangential to this discussion. i found the WASI stuff when gathering the links for the above WASI-related link list lol

Really, an interface is just about namespacing a collection of related definitions so that they can use the short names they want to without worrying about conflicting with unrelated interfaces.

Follow up question: is there a a "re-export" mechanism to combine interfaces? like take large-interface-part1.* and -part2.* and merge them into large-interface to hide the internal splitting? sorta like how rust can do pub use part1::* or js can do export * from "./part2.js"? probably not given that each interface item has a unique url-like prefix from its containing interface. now that im typing this i think i just answered my own question lol

In JS, I'd say an interface corresponds to either a module namespace object or, when they are nested, a plain old JS object (that groups together and namespaces a collection of exports together).

To clarify, you mean something like TypeScript which has a import { server } from "typescript" sub-namespace?

image

Or did you mean something like a https://npm.im/semver which has subpath exports?

image

im not sure how to interpret "module namespace object" given that the import * as moduleNamespaceObject from "the-module" vs import * as moduleNamespaceObject from "the-module/sub-module-here" are like two levels(?) of hierarchy? are subpaths like that a separate world the-subpath {}? are they interfaces? sorry this is a real tangent. you answered the core of my question and clarified the Rust <=> WIT module intuition that I had. its ok if you dont want to continue debating stylistic opinions lol

Hope that helps!

I thank you again for pointing me in the right direction and for being someone to bounce ideas off of. 😊 ❀️

jcbhmr commented 5 months ago

as a "completed" hello-world-like rust -> wasm component, how does this look https://github.com/jcbhmr/unicode-math-class.wasm ? is it idiomatic with the recommendations here?

sidenote: why does doing the interface types {} and then uses types.{...} create some weird phantom imports?

https://github.com/jcbhmr/unicode-math-class.wasm/actions/runs/7631660252

they dont seem to generate any actual "i need this from the host" imports when using jco but they still show up in the wasm-tools component wit output as well as the wit-bindgen for markdown docs: https://jcbhmr.me/unicode-math-class.wasm/unicode-math-class.html is that supposed to happen? is that a glitch? do those imports not matter?

lukewagner commented 5 months ago

wasi-related: the WASI-defined stuff is very interface-heavy! should i be using more interfaces than just interface types {}?

When you're defining a component that's not intending to implement a standardized or separately-published interface, you don't need to define interfaces just to export functions from your component's world. Separately-defined interfaces are valuable when you want to give them a name that stands apart from any particular implementation (which is what WASI is all about, which is why WASI does it all the time).

Follow up question: is there a a "re-export" mechanism to combine interfaces?

Not yet, but there is an include statement in WIT that lets you include one world from another (which is mostly like a copy-paste, leaving behind about the name of the source world in the destination world), and so it might make sense to allow include to work likewise in interfaces too. The omission is mostly just that we haven't had a pressing use case for it yet.

To clarify, you mean something like TypeScript which has a import { server } from "typescript" sub-namespace? [...] Or did you mean something like a https://npm.im/semver which has subpath exports?

The former, iiuc.

as a "completed" hello-world-like rust -> wasm component, how does this look https://github.com/jcbhmr/unicode-math-class.wasm ? is it idiomatic with the recommendations here?

That looks good. Again, it'd be nice if we'd let you define the enum directly in the world, but without that support, what you wrote looks reasonable.

sidenote: why does doing the interface types {} and then uses types.{...} create some weird phantom imports?

That does seem like a bug; I would think that the use would bind to the exported types interface. Maybe file a wit-bindgen bug and CC me to see if I'm misunderstanding?

jcbhmr commented 5 months ago

posted more detailed issue https://github.com/bytecodealliance/wit-bindgen/issues/822

jcbhmr commented 5 months ago

you've pretty much addressed all of my questions! thank you again! ❀️ im happy to close this now since you answered my "wait, what should go in an interface/world/package and what should they each be called?" question πŸ™ thanks again!

if possible or practical i think it'd be a good idea to summarize and extract a few of the "conceptual revelations" or whatever you wanna call them that occurred here and put them in some kind of document either in this repo or https://github.com/bytecodealliance/component-docs or in cargo-component's docs or whatever

ex: