emscripten-core / emsdk

Emscripten SDK
http://emscripten.org
Other
2.92k stars 660 forks source link

[Bazel] Dynamically generate the Emscripten cache #1402

Closed allsey87 closed 3 weeks ago

allsey87 commented 3 weeks ago

This is a draft PR for sharing my progress on this feature. The idea is to use embuilder to generate the Emscripten cache dynamically by specifying what you want upfront and in which configuration (i.e., wasm64, lto, pic etc).

For now I would really appreciate any input on these remaining problems:

  1. Output names Some libraries generate .a and some .o, I suspect that it is just the c runtime libraries that emit the .o files, so I could tell Bazel that if the name of the library begins with crt then its output is .o?

  2. Overlaying the cache I added my output to link_files but this seems to be ignored by the Emscripten linker. When I compile a program, Emscripten considers the cache to be: external/emscripten_bin_linux/emscripten/cache/ while, the cache assets from my genrule are actually stored at: execroot/_main/bazel-out/k8-opt-exec-ST-13d3ddad9198/bin/external/emscripten_bin_linux/emscripten/cache/. I am not sure what the best approach for moving forward here is.

  3. Portability I haven't found away to get rid of the dirname and realpath for the moment. It seems almost impossible to provide a rule in Bazel with a path which means it is very difficult to set the values of BINARYEN_ROOT etc for embuilder. The best solution I can think of for the moment is to leave the Bash-isms in place and just use cmd_bash and cmd_bat (provided by genrule) to set the environment variables.

Related issues:

allsey87 commented 3 weeks ago

@walkingeyerobot let me know if you think this is going in a reasonable direction or if you would do something differently

sbc100 commented 3 weeks ago

Is there some shared, writable location that we can use is the one true location for the emscripten cache? Somewhere that is both readable and writable when the build rules run?

allsey87 commented 3 weeks ago

Somewhere that is both readable and writable when the build rules run?

You can always set the following in .bazelrc

build --action_env=EM_CACHE=/tmp/emscripten_cache --action_env=EM_FROZEN_CACHE=0

This seems to work, well, sort of... The cache is built fine but it is breaking my builds for the moment. When using:

allsey87 commented 3 weeks ago

I thought I had a really nice alternative solution to what I have done so far. The idea would have been to move cache generation from inside emscripten_bin_XXX to emsdk after the toolchain has been set up and then during builds override EM_CACHE on a per target basis.

For example, you set up a secondary cache in your WORKSPACE with something like:

emsdk_emscripten_cache(
    name = "pic_cache"
    mode = ["--pic"],
    libraries = ["crtbegin", "libprintf_long_double-debug"]
)

and then for each target that needs to use this cache, you could set EM_CACHE=//:pic_cache via env. The problem: the meaning of env seems to vary a lot in Bazel. For foreign cc rules like configure_make, this works since the entries in env are set during build. However, for the inbuilt cc_binary, env appears to be only set at runtime.

In addition to setting EM_CACHE globally (as per the previous comment), I am also experimenting with the idea of using select in the toolchain to swap out emscripten_config for an alternative, generated configuration file that sets CACHE to the secondary cache iff it is declared. At the moment, I am stuck on an issue with the paths which I asked about in the Bazel user group.

allsey87 commented 3 weeks ago

Closed in favour of #1405