WebAssembly / component-model

Repository for design and specification of the Component Model
Other
933 stars 79 forks source link

Explicit field offsets in records? #194

Open titzer opened 1 year ago

titzer commented 1 year ago

In some situations, we might want to describe record types from existing binary formats or low-level software layers that already have fixed layouts that cannot be changed. Sometimes such records have alignment or unused space between fields. A natural way to express this is to have (byte) offsets on fields, rather than relying on a particular packing behavior.

For example, I've found this particular useful for describing file formats and kernel interfaces in a related project.

Is this something that could be added to the WIT record definition, e.g. as an optional field offset?

lukewagner commented 1 year ago

Currently, the scope of Wit (and the component model types it is based on top of) is to describe abstract values interchanged between different guest or host languages, where we specifically want to abstract over the binary layout/representation, leaving these choices as encapsulated impl details of canon lift or canon lower. Describing preexisting byte layouts of files or syscalls seems like a rather different problem to solve with different design constraints that may ultimately suggest a different approach altogether (e.g., if it was a goal to faithfully describe existing C ABI layouts) and so I'd be reticent to add just a little bit of memory layout metadata without understanding the whole problem we're trying to solve.

That being said, I can imagine some scenarios where today we'd describe file/socket payloads via list<u8> or stream<u8>, but if we could overlay better type information on top of or in place of that u8, we could get much nicer automatic bindgen (that otherwise the user would have to write manually). So maybe there is something useful we can do in this more-focused context, but I'm not sure if that means simply adding offsets to existing records or perhaps introducing new type constructor(s) oriented towards overlaying linear byte sequences.

titzer commented 1 year ago

Ok, I can see how explicit field offsets are effectively part of an ABI, and thus part of a lowering of abstract Wit records. What would be the right way to nail down a specific (linear memory) lowering to conform to a layout dictated by existing software or hardware?

titzer commented 1 year ago

FWIW, Virgil has an embedded DSL for specifying memory layouts: https://github.com/titzer/virgil/blob/master/doc/tutorial/Layouts.md

lukewagner commented 1 year ago

Hmm, thinking about this question again, maybe you're right that the best place to put this sort of layout information in indeed on the record. It wouldn't be part of the abstract value being copied between components (which is where I was initially hung up), but rather the layout information would feed into the Canonical ABI lift/lower algorithms and override the default layout rules. I guess this isn't technically different than what we do for existing specializations like flags which are semantically just bool-valued records, but, because you said "flags", the ABI changes to bitfields.