WebAssembly / component-model

Repository for design and specification of the Component Model
Other
933 stars 79 forks source link

Wit text format comments encoded in the binary Wasm binary #213

Open calvinrp opened 1 year ago

calvinrp commented 1 year ago

This has been discussed. Formally, creating an issue for it.

Currently, we lose comments / docs from Wit text formats when compiling to the binary Wasm component file format. This is especially an issue when publishing to the Warg registry that is designed to only accept Wasm binary files.

Likely implementation would involve some custom section convention.

lukewagner commented 1 year ago

Thanks for filing this; agreed this is something we should fix and agreed that this is probably a custom section thing. Some initial thoughts on what we might want from a custom section format:

lann commented 1 year ago

I can take a shot at this. Would we expect this specification to be part of this repo?

peterhuene commented 1 year ago

I don't believe we currently document the binary encoding of WIT, but my hope, at least, it will be soon (and look more like any other component that simply exports the relevant types); I would assume we would then want to document the comment custom section here too.

lann commented 1 year ago

Since it's mostly text anyways, perhaps the custom section contents should be human-readable/editable text (unlike, e.g., the name section) so that, e.g., the "exploded" representation of a component has the documentation in a simple text file that can be easily edited before reimploding.

Two thoughts on how this could be accomplished, both of which require defining some unique encoding of the "path" to each item.

I'm not sure either of these is all that great, but I'd be happy to hear feedback / other ideas.

lann commented 1 year ago

It would be useful to have a well-defined validation predicate that we can regularly check (e.g., on registry publication) to avoid the drift I expect we'd get otherwise.

You mean checking that a package is "sufficiently documented" a la Rust's #![deny(missing_docs)]? (whatever that might precisely mean here)

lukewagner commented 1 year ago

To the "does this belong in this repo/spec" question: I think so, probably as a new .md in design/mvp (covering both the documentation and encoding of Wit into C-M types) in the short-term, and in an appendix (like in the core spec) in the official document. Thanks for offering to help Lann!

As for your first question: I expect we want just one custom section, so yeah, the JSON approach probably makes sense. Half-baked idea: instead of putting the <path> in the key, what if the nesting structure was mirrored in the JSON object nesting structure, with imports and exports at the top-level, and then the importname/exportname as the next-level key, and then further nesting is determined by the externdesc (exports of an instance type, parameter/result-names of a func type, etc)? This would have the nice effect of "factoring out" the common prefix and maybe also be somewhat readable.

You mean checking that a package is "sufficiently documented" a la Rust's #![deny(missing_docs)]? (whatever that might precisely mean here)

Oh, no, more like: given a component .wasm, validate that, if it contains a documentation section, it's well-formed (e.g., valid JSON and each referenced name exists).

lann commented 1 year ago

I have some prototype code for this now: https://github.com/bytecodealliance/wasm-tools/pull/1169

The JSON schema ended up being directed a bit by the wit_parser::Resolve internals. I think the result of that is that it more closely matches the WIT structure than the equivalent binary encoding.

lukewagner commented 1 year ago

Nice! On first glance, it looks really good. Initially I wasn't sure about documenting Wit-level concepts in the schema, but on second thought it does seem like the right level of abstraction. What's interesting is that the well-formedness predicate of the docs section will end up depending on the encoding-scheme of Wit into component types, which is something we need to specify precisely in any case.

Incidentally, we were just talking with @peterhuene about where the package statement in Wit goes so that it can be roundtripped, and it seemed like maybe it would belong in a docs section, so maybe that's an additional top-level key in the schema.

alexcrichton commented 1 year ago

In the near-term JSON I think is ok but in the long-term I'm not sure if it makes sense. Luke above said:

perhaps the custom section contents should be human-readable/editable text

but I'd call that into question in the sense that I'm not sure what this would be used for? You can't, for example, open up a *.wasm binary in a text editor and change the contents of the section because at the very least there's a header at the beginning of the proposed custom section indicating how large the section is. Otherwise I'm not sure what the workflow would look like for editing the comments in a component, but the closest I'd imagine is that you'd explode the wasm binary back into a WIT package, edit some bits, and the re-implode back into a wasm binary. In this case the encoding format doesn't matter since only the text contents are being updated.

One point in favor against JSON I think is that I think in the long run it doesn't really provide much benefit over a section defined in the manner of the name section. Even with JSON we'd still have to document a schema which feels similar to the work necessary to define a binary format as well. I also feel that using a custom format would avoid the need to shoehorn everything into JSON whether it fits there or not.

To clarify again though I think JSON is fine for now, but I do think we'll want to keep the door open to updates in the future.

oovm commented 6 months ago

I want to know what the documentation comments in wat format look like and where should they be written?

My tool generates wat with $id in debug mode, and then generates wasm in release mode.

oovm commented 6 months ago

Another question is whether to save markdown or compiled html format.

Benefits of using html

lann commented 6 months ago

Another question is whether to save markdown or compiled html format.

One of the goals of this feature is to be able to transform the binary encoding back into something equivalent to the input WIT text, which strongly suggests that comments should be preserved ~verbatim.