bytecodealliance / javy

JS to WebAssembly toolchain
Apache License 2.0
2.16k stars 103 forks source link

Avenues for rich host functions #570

Closed ggoodman closed 8 months ago

ggoodman commented 8 months ago

What is your question?

First, a caveat: I'm new to rust and WASM and risk asking some truly naive questions.

Hi team, huge fan of your work and the javy project. I really love this idea of a portable WASM binary in which the full power of ES2020 can be unleashed.

I've read through your thoughts on complex data types and have understood some of it. My understanding is that because of number-based I/O over the WASM boundary, passing and receiving complex arguments to a javy program is a BYO exercise. In my exploration of the ecosystem, I've seen projects like wasm-bindgen and jco that seem to offer better ergonomics for crossing the WASM <--> host boundary. My use-case is very much like Shopify's use-case except that our host runtimes would either be JavaScript or go-based.

I'm trying to understand how those other projects might relate to something like javy and where to start.

Some of what I'm trying to accomplish is:

AFAICT from public docs is that the Shopify use-case uses a subset of WASI for passing in bytes over stdin and collecting bytes over stdout. Were there trade-offs that pushed you in this direction over coupling to host functions?

nickwesselman commented 8 months ago

Hi @ggoodman -- others from Shopify may jump in here on the relationship to other projects, but some of the reasons we chose JSON over I/O streams include:

ggoodman commented 8 months ago

Hi @nickwesselman thanks a ton for getting back to me.

We use GraphQL queries for the wasm author to define the input for the module, which makes the API dynamic. It's a potentially very large API surface and we only want to provide the module with the data it needs.

That idea of using a GraphQL query as a declarative signal of required inputs is fascinating! What an interesting idea. Thanks for sharing.

WIT wasn't a thing when we were first designing our wasm-based extensibility, so there weren't great options for exchanging rich data structures over host calls.

Do you get the sense that your design might be different if revisited today, with the current component model momentum?

We wanted an extensibility API contract that could easily be implemented across many languages. WIT is improving in many toolchains but reading/writing JSON over stdio is about as simple as it gets.

I was playing around with Javy a bit and struggled to get things working with the js event loop. I tried experimenting with a model where stdin passed in length-prefixed messages over time. Any Promises / micro tasks seemed to cause the WASM program and runtime to exit. I think (rust noob) I enabled the event loop feature when I built javy-cli locally so I feel like I'm not quite understanding the different moving parts quite yet.

Have you done any experimentation with async javascript workloads?

nickwesselman commented 8 months ago

That idea of using a GraphQL query as a declarative signal of required inputs is fascinating! What an interesting idea. Thanks for sharing.

No problem, you can read more about our approach here.

Do you get the sense that your design might be different if revisited today, with the current component model momentum?

I think it's too early for us to say. We are certainly looking at the component model and how practical it is for our execution model, and there is definitely a performance cost to JSON serialization and deserialization.

Have you done any experimentation with async javascript workloads?

Our execution is entirely sync and short lived, so we do not enable the event loop. One of the actual maintainers from our team may have more to say here though, I'm just a PM. šŸ˜†

jeffcharles commented 8 months ago

šŸ‘‹ I'm back from vacation šŸ˜„

My understanding is that because of number-based I/O over the WASM boundary, passing and receiving complex arguments to a javy program is a BYO exercise.

Yes. Javy uses Core Wasm which only supports integers and floats for arguments and return values.

In my exploration of the ecosystem, I've seen projects like wasm-bindgen and jco that seem to offer better ergonomics for crossing the WASM <--> host boundary.

Yes they absolutely do offer better ergonomics. However, the tradeoff is they depend on the runtime environment supporting the Wasm Component Model as opposed to Core Wasm. Older versions of wasm-bindgen did not require the Component Model but modern versions do.

Were there trade-offs that pushed you in this direction over coupling to host functions?

The big tradeoff is that, with the context of using Core Wasm and not the Component Model, it's much, much easier for people using Javy (or other languages) to read from stdin and write to stdout/stderr compared to importing a host function and then invoking that host function given the arguments for a host function in Core Wasm can only take and return numbers. The tradeoff would be different using the Component Model but the Component Model wasn't available when we made this decision.

So to send a byte array (and by extension, a string representation) back and forth between the host and the Wasm instance, the Wasm module will need custom code to convert the byte array to a pointer and a length (or another pointer to represent the end of the array) and the host function will need to perform a similar conversion. And to receive input from a host function, the Wasm module would likely need an additional function to get the size of the input so it can allocate the correct of memory or export a function that the host can use to allocate memory of a certain size. Using WASI stdio streams or the Component Model is much more straightforward.

Any Promises / micro tasks seemed to cause the WASM program and runtime to exit

By default, Javy does not run the event loop and will trap if there are events remaining on the event queue after the JS has executed.

I think (rust noob) I enabled the event loop feature when I built javy-cli locally so I feel like I'm not quite understanding the different moving parts quite yet.

The javy-core crate is where the experimental_event_loop feature needs to be enabled. We have a feature with the same name on the javy-cli crate but it just changes which integration tests are run, it doesn't actually enable the event loop.

Something like:

$ cargo build -p javy-core --target=wasm32-wasi --features=experimental_event_loop -r
$ cargo build -p javy-cli -r

should build the CLI with event loop support enabled.

I hope this helps!

jeffcharles commented 8 months ago

I'm going to close this issue for now. Feel free to continue the conversation or re-open if you feel that something was not addressed.

ggoodman commented 8 months ago

Hi thanks again for getting back to me. I just got back from the break today and will be poking around in the space over the next little while (or more, hopefully!). Our use-cases appear to be remarkably similar but the level of expressiveness we want / need to give to customers seems to be a bit different.

As a first level of exploration. I'm imagining a world in which we can transparently take untrusted JavaScript and determine if it can be fully understood through static analysis. Our extensibility use-case has some domain-specific APIs we expose in JavaScript that we fully understand. My intuition is that there will be a meaningful subset of untrusted code that only leverages basic JavaScript and our own proprietary APIs. Anything that can be fully understood in this way can then be translated / transpiled / simplified to logic that could be run using Javy's lighter-weight isolation approach.

That gives us a pathway towards on-boarding increasingly complex JavaScript workloads that require an event loop (or at least a microtask queue) and later controlled I/O. Any opportunities to shift untrusted code to lighter-weight and more strongly-isolated runtimes would be a major boon.

Are there any more interactive venues you recommend to chat?

jeffcharles commented 8 months ago

Are there any more interactive venues you recommend to chat?

I'm on the Bytecode Alliance Zulip and keep an eye on the Javy channel.