bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.69k stars 3.98k forks source link

Support for executing WASM modules in repository_ctx #22483

Open jmillikin opened 1 month ago

jmillikin commented 1 month ago

Description of the feature request:

I would like to be able to execute a WASM module from a repository_rule implementation function, as an alternative to native binaries.

The API would look something like this:

repository_ctx.execute_wasm(path, input, timeout=600, entry_point=None)
  path: `string`; or `Label`; or `path`; required
      Path to a `.wasm` module to execute.
  input: `string`; required
      Input to provide to the module (not interpreted by Bazel).
  timeout: `int`
      Maximum duration of the command in seconds.
  entry_point: `string` or `None`
      If set, invoke the named export instead of the default entry point.

  return: `wasm_exec_result`
      field `output`: `string`
          Output from the module (not interpreted by Bazel).
      field `return_code`: `int`
          The return code returned after the execution of the module. 256 if execution
          was terminated by a timeout.

The repository rule implementation function is responsible for assembling an input string and parsing the output string according to its own needs -- for example, it might use JSON + the json module for structured input/output. The WASM module itself has no access to repository_ctx functionality.

From the WASM side, the API looks like this:

func example_entry_point(
    input_ptr: *uint8,
    input_len: uint32,
    output_ptr: **uint8,
    output_len: *uint32,
) -> uint8 /* return_code */

Which category does this issue belong to?

External Dependency

What underlying problem are you trying to solve with this feature?

Generating BUILD and .bzl files within a repository rule improves the user experience when adapting third-party code to build with Bazel. When the generation logic is too complex to write in Starlark, rulesets often rely on a helper binary for language-specific logic (e.g. enumerating imports).

Embedding a WASM interpreter into Bazel and letting it execute WASM modules as part of repository rules can enable a different approach, where pre-compiled .wasm modules ship with the rules and can be executed on any platform that can run Bazel itself.

Keeping the API very small (string input, string output) keeps the maintenance burden on Bazel itself to a minimum, with no need to worry about things like WASI or how to do WASM <-> JVM FFI.

Which operating system are you running Bazel on?

Linux (x86-64), macOS (aarch64)

What is the output of bazel info release?

release 7.1.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

jmillikin commented 5 days ago

I've built a proof-of-concept for this feature using Chicory, a WASM runtime written in Java. It isn't large (maybe ~500 LoC excluding tests), and it's able to run modules written in Go (via TinyGo) and Rust. Chicory itself is small (~161 kB) and has no additional dependencies.

Would the Bazel devs be interested in reviewing an implementation PR?

Demo repository rule:

def _wasm_demo(ctx):
    # Would be obtained from ctx.download_and_extract() normally
    ctx.file("example/src.go", "package main\n")

    # Inputs are assembled by Bazel, using path.readdir() / ctx.read(), etc
    srcs = {"example/src.go": ctx.read("example/src.go")}

    result = ctx.execute_wasm(
        path = ctx.attr._demo_wasm,
        function = "wasm_repo_demo",
        input = json.encode({"srcs": srcs}),
    )

    # Output contains instructions to the rule as to which changes
    # to apply, for example writing/patching BUILD files.
    output = json.decode(result.output)
    print("output: %s" % (repr(output),))
    ctx.file("BUILD.bazel", "")
    ctx.file("WORKSPACE.bazel", "")

wasm_demo = repository_rule(
    _wasm_demo,
    attrs = {
        "_demo_wasm": attr.label(
            default = "@//:demo.wasm",
            allow_single_file = True,
        )
    }
)
fmeum commented 5 days ago

Could this be realized by a third-party Starlark library that either 1) downloads the Chicory JAR and executes it with Bazel's embedded JDK (slight hack) or 2) downloads a Graal Native Image of Chicory and then runs it? That would make it possible to update Chicory separately from Bazel releases.

jmillikin commented 4 days ago

2) downloads a Graal Native Image of Chicory and then runs it?

The advantage of Chicory is that it runs in a JVM. If the interpreter is to be distributed as a native executable then WAMR would be a better approach. And downloading a native WASM interpreter wouldn't provide much benefit over simply downloading natively-compiled versions of a repository generator tool, with similar downsides with regards to portability.

1) downloads the Chicory JAR and executes it with Bazel's embedded JDK

I'm not sure how this would work -- is there a way to access Bazel's Java runtime from within a repository rule? My understanding is that it gets bundled into the bazel executable and unpacked to a temporary directory, and I don't see a way to locate that directory from a repository_ctx or module_ctx.

Using rules_java~~toolchains~remotejdk* to run a .jar would work, but downloading a JRE is the same as downloading a WASM interpreter.

That would make it possible to update Chicory separately from Bazel releases.

Is that an important goal? According to Chicory's roadmap to v1.0, the functionality that Bazel would use (an interpreter that implements the WASM v1.0 spec) is complete.

The functionality yet to be implemented is less necessary for the Bazel use case:

fmeum commented 2 days ago

And downloading a native WASM interpreter wouldn't provide much benefit over simply downloading natively-compiled versions of a repository generator tool, with similar downsides with regards to portability.

Yes, this is certainly less convenient, but I wonder how much: The interpreter would only need to be downloaded once and could then be used by arbitrarily many repo rules. OS/arch detection in repo rules isn't great, but it's ultimately using the same source of truth as Bazel itself (JVM system properties) and so shouldn't introduce additional portability concerns.

Thanks to Cosmopolitan, we could potentially even use a single binary across all platforms: https://github.com/wasm3/wasm3

The main advantage of a solution outside Bazel is that rulesets could immediately adopt it rather than waiting until, say, 7.3.0 is their minimum supported version of Bazel.

jmillikin commented 2 days ago

I think if someone wanted to write a WASM interpreter binary for use by the Bazel rule ecosystem, they would have done so already. And the existence of native WASM support within Bazel would not prevent someone from doing so, should they feel inspired.