bytecodealliance / WASI-Virt

Virtual implementations of WASI APIs
Apache License 2.0
127 stars 15 forks source link

Feature request: command virtualization by argv[0] #39

Open whitequark opened 7 months ago

whitequark commented 7 months ago

WASI-Virt allows me to "bake" resource/data files into an executable, which is very handy for YoWASP. But there is an issue: I have a few cases where I have several small, related executables depend on the exact same set of very large (tens to hundreds of MB) of data files. Obviously I do not want to duplicate them.

This calls for a virtualization mode where several commands are combined into one multi-call binary and dispatched based on argv[0] (or its suffix, to handle the case where you prepend an executaable with a platform name), like Busybox famously does.

guybedford commented 7 months ago

In theory it should be possible to do data section sharing with component model imports, where a separate binary could just have the data core module. I'm not sure exactly how to do it, or if there are low-level limitations, but that might be an interesting approach to explore.

whitequark commented 7 months ago

I think this will automatically happen if I'm using WASI filesystem virtualization combined with argv0 virtualization, since all of the large data will be in the virtualization stub and the individual executables will dispatch to it when they load it. Am I missing something?

guybedford commented 7 months ago

This would be low-level binary optimization stuff, but I wonder if there's a way for the virtualization stub that has the data segment to somehow import that itself in a way that can be shared with the other binaries that have that same data segment.

@lukewagner may be able to provide some further feedback here, as to whether this is possible or how it might also be achieved.

whitequark commented 7 months ago

Oh, I see; so the idea would be to share data on file level (e.g. address the .wasm files by content) rather than to put things into one component?

It is kind of desirable for me to put several binaries into one component anyway as a part of my ongoing effort to do as much stuff as possible within Wasm itself and not in wrappers around it (ideally I want an entire toolchain as a single .wasm file, including internal binaries spawned via popen()).

guybedford commented 7 months ago

Because of component model composability, anything that works for multiple component files will always be able to be composed into a single component binary file.

So I think the solution space is still the same?

That is, even when having a single binary, you don't want that single file to have multiple versions of the same data internally, so need an approach for sharing it.

Having WASI Virt support multiple command virtualizations at the same time is a really interesting way of approaching it too though, I guess I would need to think through the various usability cases in more detail. The argv[0] splitting feels quite specific, where virtualization isn't currently confined to commands.

whitequark commented 7 months ago

Oh yeah, I do want argv[0] virtualization personally for my specific use case but I am by no means insisting that it should be limited to that. It can be a quite generic composition mechanism.

lukewagner commented 7 months ago

Core modules and components can't import data segments directly (although we should really consider adding this!), but we can fairly easily approximate this ability by wrapping the shared data segment with a thin core module that imports a linear memory and exports functions that call memory.init for the given data segment on the imported memory. Such a core module can then be imported by any number of components (noting that core modules are immutable and thus fine for import by components). I think this technique would allow you to factor out shared static assets from multiple components.