emscripten-core / emsdk

Emscripten SDK
http://emscripten.org
Other
2.92k stars 660 forks source link

[Bazel] Dynamically generate Emscripten cache #1401

Closed allsey87 closed 6 days ago

allsey87 commented 3 weeks ago

I am investigating a solution for solving the problem of the frozen cache + wasm64, lto, and pic once and for all. The idea is to pass both the settings and the ports that you want built to emscripten_deps which will generate the appropriate genrules and pull the generated files into the correct file groups so that everything is made available once during toolchain set up.

To make this work, I would change emscripten_deps to take a two addition arguments as follows with defaults set up for backwards compatibility:

emscripten_deps(
   emscripten_version = "latest",
   features = ["wasm32"], # lto, lto-thin, pic, wasm64
   targets = ["crtbegin", ...]
)

The idea would be to then loop over the targets, creating a genrule for each and calling embuilder to make the asset available in the cache. The string of all genrules would then be injected into BUILD_FILE_CONTENT_TEMPLATE. I started work on a rough prototype but before taking it further, please let me know if there are any reservations that would prevent this getting merged.

At the moment, my rough prototype sits external to @emscripten_bin_linux// but the idea will be to move this code inside it so that I can tweak what is available in the cache.

genrule(
    name = "test-embuilder",
    tools = [
        "@emscripten_bin_linux//:emscripten_config_upstream",
        "@emscripten_bin_linux//:emscripten/embuilder.py",
        "@emscripten_bin_linux//:compiler_files",
        "@emscripten_bin_linux//:linker_files",
        "@emscripten_bin_linux//:ar_files",
    ],
    cmd = """
echo "import os" >> embuilder_config
echo "CACHE = '$$(realpath $(RULEDIR))'"
echo "BINARYEN_ROOT = '$$(realpath $$(dirname $(location @emscripten_bin_linux//:emscripten_config_upstream)))'" >> embuilder_config
echo "LLVM_ROOT = os.path.join(BINARYEN_ROOT, 'bin')" >> embuilder_config
echo "EMSCRIPTEN_ROOT = os.path.join(BINARYEN_ROOT, 'emscripten')" >> embuilder_config

$(location @emscripten_bin_linux//:emscripten/embuilder.py) \
    --em-config embuilder_config \
    --pic \
    build crtbegin
""",
    # This rule crashes here at the moment since crtbegin.o is not created in RULEDIR and embuilder seems to be ignoring
    # the CACHE environment variable
    outs = ["crtbegin.o"]
)
sbc100 commented 3 weeks ago

One alternative long term solution might be build all the system libraries from source via bazel rules. I know @walkingeyerobot has been thinking about this for a while now, but I'm not sure how close it is.

allsey87 commented 3 weeks ago

I am not necessarily against that approach but that would take a bit of time, right? From (thin) LTO, PIC, and 64-bit, we have 16 (?) possible combinations multiplied by around 40 different ports/libraries? That's 640 jobs based on my estimates...

sbc100 commented 3 weeks ago

I am not necessarily against that approach but that would take a bit of time, right? From (thin) LTO, PIC, and 64-bit, we have 16 (?) possible combinations multiplied by around 40 different ports/libraries? That's 640 jobs based on my estimates...

The idea is that those system libraries would be just like any other sources in your project so only the precise config that you need to for a given build would be produced. There would be no need to enumerate them or build all of them ever.

walkingeyerobot commented 3 weeks ago

I am fine with this change in general. I'm too far from getting runtimes on demand to work with bazel, and if you have the bandwidth and motivation to contribute this, I'm happy to accept it.

However, I don't think we can make use of bash in genrules. The bazel toolchain works on linux, mac, and windows, and while the bash that you have will work great on linux, bash on mac has some odd quirks and on windows it simply doesn't exist. If you can write this in something more portable (i.e. python) then I think that'll be fine.

Also, this is complex enough that I would ask for a smoketest to be written for CI for all three supported platforms.

allsey87 commented 3 weeks ago

I am fine with this change in general. I'm too far from getting runtimes on demand to work with bazel, and if you have the bandwidth and motivation to contribute this, I'm happy to accept it.

However, I don't think we can make use of bash in genrules. The bazel toolchain works on linux, mac, and windows, and while the bash that you have will work great on linux, bash on mac has some odd quirks and on windows it simply doesn't exist. If you can write this in something more portable (i.e. python) then I think that'll be fine.

Also, this is complex enough that I would ask for a smoketest to be written for CI for all three supported platforms.

I think it can be easily wrapped into a Python script, after all, it is mostly just running embuilder.py anyway.