emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.73k stars 3.3k forks source link

Missed juicy code size optimization with -sEXIT_RUNTIME=0 #17872

Open juj opened 2 years ago

juj commented 2 years ago

Example code: a.cpp

#include <vector>
#include <emscripten.h>

std::vector<double> d;

int main()
{
    // just some random stuff to prevent dead code elimination
    for(int i = (int)emscripten_get_now(); i >= 0; --i)
        d.push_back(i);
    return d[d.size()/2];
}

Build with em++ a.cpp -o a.js -Oz -g3 -sEXIT_RUNTIME=0.

The generated code will emit the following global initializer for the std::vector instance in wasm:

 (func $__cxx_global_var_init
  (drop
   (call $std::__2::vector<double\2c\20std::__2::allocator<double>>::vector\28\29
    (i32.const 1684)
   )
  )
  (drop
   (call $__cxa_atexit
    (i32.const 1)
    (i32.const 0)
    (i32.const 1024)
   )
  )
 )

Note the __cxa_atexit call. This should not be present in the generated code, since the code is being built with -sEXIT_RUNTIME=0. This atexit directive is registering a destructor function pointer, which is present in the function pointer table as:

(elem (i32.const 1) $__cxx_global_array_dtor

which is

 (func $__cxx_global_array_dtor (param $0 i32)
  (drop
   (call $std::__2::vector<double\2c\20std::__2::allocator<double>>::~vector\28\29
    (i32.const 1684)
   )
  )
 )

This dtor is likewise never called, since EXIT_RUNTIME=0.

I see that in Unity codebase and other codebases, there are quite a lot of these global dtors and __cxa_atexit directives piling up even when the codebase is built with -sEXIT_RUNTIME=0.

Would there be a good way to get rid of those when the runtime will never exit?

kripken commented 2 years ago

With EXIT_RUNTIME the atexit calls turn into noops, so what I think happens here is the functions whose address was passed in as a parameter remain alive. That, is atexit(&foo) does nothing, but just taking the address of foo keeps it alive. LTO should be able to handle this, but unfortunately, libnoexit where we define the atexit stubs has this:

https://github.com/emscripten-core/emscripten/blob/4e9ba1435d45dd64ca7bb815cdb181fddcdfd743/tools/system_libs.py#L835-L839

So it does not participate in LTO. I think perhaps we can optimize that, however. If we had JS stubs for them, that would handle the case where no atexit exists before LTO and LTO creates one.

A larger factor in that testcase, however, are exceptions. Building with -fno-exceptions shrinks the table by quite a lot, at least with -flto. That might be worth looking into.

juj commented 2 years ago

With EXIT_RUNTIME the atexit calls turn into noops

Yeah, the __cxa_atexit implementation itself is a no-op. Trying with -flto I do see that the result is the same, i.e. the cxa_atexit calls are still there, and the dtors are pinned in the function pointer table, since they did have their addresses taken.

An alternative solution might be to solve this at the root, i.e. have a WebAssembly EXIT_RUNTIME=0 specific flag propagate directly to instruct Clang/LLVM to never emit any cxa_atexit calls. Maybe that might be the cleanest solution? Such a solution would turn -sEXIT_RUNTIME=0 from a linker only flag to a compile+link flag, but I would be happy with that solution as well.

CC @dschuff @tlively

juj commented 2 years ago

(also with -fno-exceptions -flto combo the same result)

tlively commented 2 years ago

-fno-c++-static-destructors seems to do the trick. Automatically adding that at compile time when EXIT_RUNTIME=0 makes sense to me.

juj commented 2 years ago

Ohh wow of course there already was a flag, that's perfect! I can confirm it does work.

Yeah, it would be good to enable that with EXIT_RUNTIME=0.

kripken commented 2 years ago

Good find @tlively !

As for implementing this, a minor issue is EXIT_RUNTIME is a linker flag atm, while that's a compile-time flag. Not sure offhand how best to do this, given that: we could either make it a compile-time flag too, or we could look if there's a way to do this at link time somehow.

sbc100 commented 2 years ago

Is there some way we can make this work at LTO time? Then we keep EXIT_RUNTIME as a link time flag and say that it requires LTO to get the full effect of it?

tlively commented 2 years ago

Is there some way we can make this work at LTO time? Then we keep EXIT_RUNTIME as a link time flag and say that it requires LTO to get the full effect of it?

Probably not, since it's a frontend flag for clang, unfortunately. By the time you get LLVM IR that contains the destructors, it's too late for that flag to do anything.

I tried looking for documentation for EXIT_RUNTIME so we could mention -fno-c++-static-destuctors alongside it, but I didn't find any good documentation locations for it. It appears that this is one of those settings primarily documented in settings.js.