Optimizing output .js size: using createExportWrapper with no assertions

rth commented 3 years ago

While looking at produced .js files for Pyodide (pyodide.asm.js, 1.78 MB total, 293kB compressed) a lot of the size seems to be due to this function definition for each symbol, https://github.com/emscripten-core/emscripten/blob/e77adf8a3a79e79b666dce6d7ad0d6d3b9e94b20/emscripten.py#L671-L674 in part because the mangled name can be quite long and it's repeated 4 times there.

Would it make sense to use an equivalent of createExportWrapper instead as done a few lines above? https://github.com/emscripten-core/emscripten/blob/e77adf8a3a79e79b666dce6d7ad0d6d3b9e94b20/emscripten.py#L659-L661

or even

 /** @type {function(...*):?} */ 
 var %(mangled)s = createExportWrapper("%(name)s", Module); 
 ''' % {'mangled': mangled, 'name': name})

where createExportWrapper is currently defined only when assertions = true. I'm not saying to use createExportWrapper exactly but an equivalent simpler function with no assertions. Unless there is a reason the current solution is preferable? Thanks!

kripken commented 3 years ago

I think you're right. That would be a good approach.

The var x = part is hard to avoid (at least without eval...), but all the rest can be unified.

There may be a slight downside in that more code is run during startup, but I'm pretty sure it would not matter in practice, and the smaller download might more than offset it.

The only other possible downside I can think of is that stack traces would look slightly less clear. But that part of the stack trace should not really matter.

sbc100 commented 3 years ago

The only other possible downside I can think of is that stack traces would look slightly less clear. But that part of the stack trace should not really matter.

Also we are talking about release builds here which likely already have less useful stack traces. When ASSERTIONS is enabled we already use a wrapper so this won't effect debug builds I think?

kripken commented 3 years ago

@sbc100 Good point, yes. Stack traces should be fine then.

rth commented 3 years ago

Thanks for the feedback! I'll make a PR then.

rth commented 3 years ago

I did some refactoring in https://github.com/emscripten-core/emscripten/compare/main...rth:use-export-wrapper, assuming that it's OK to use globalThis instead of var for defining globals. The issue is that it turns out this has very little impact on the size in the case of pyodide.asm.js. Other things that I could try are,

moving the for loop over symbol names from make_export_wrappers (in emscripten.py) inside createExportWrapper (in src/preamble.js). So that,
```
createExportWrapper('a', '_a');
createExportWrapper('b', '_b');
// repeated 10k times in my case
```
becomes
```
createExportWrapper({a: '_a', b: '_b', ...})
```
though that would make stack traces harder to read indeed.
Use as default value '_' + name for mangled value in createExportWrapper to avoid repeating the name, so that
```
createExportWrapper('a', '_a');
```
becomes
```
createExportWrapper('a');
```
(or createExportWrapper({a: undefined}) in the vectorized case). This would add more compute at initialization time and maybe add slightly more complexity.

Overall I wonder if this change is worth it. In the end, the issue in my case is that there are symbol names like,

    createExportWrapper(
      "_ZNSt3__216allocator_traitsINS_9allocatorINS_17basic_string_viewIcNS_11char_traitsIcEEEEEEE10__max_sizeENS_17integral_constantIbLb1EEERKS6_",
      "__ZNSt3__216allocator_traitsINS_9allocatorINS_17basic_string_viewIcNS_11char_traitsIcEEEEEEE10__max_sizeENS_17integral_constantIbLb1EEERKS6_"
    );

and maybe looking into why they need to be exported and whether there is a way to shorten the name would make more sense.

I can open the above linked PR if you find it useful though.

sbc100 commented 3 years ago

@rth, thanks for investigating this.

Are you aware that createExportWrapper is only used when assertions are enabled? In release builds this functions not included and never used. So optimizing its usage, while maybe good for readability, is not really going to save any size in production builds.

sbc100 commented 3 years ago

Regarding the exports of things like _ZNSt3__216allocator_traitsINS_9allocatorINS_17basic_string_viewIcNS_11char_traitsIcEEEEEEE10__max_sizeENS_17integral_constantIbLb1EEERKS6... do you know why this is bring exported in your case? Are you building with -sMAIN_MODULE=1 (in which case I recommend switching to MAIN_MODULE=2) or maybe -sLINKABLE?

sbc100 commented 3 years ago

Oh wait.. I think I remember your use case and you are building with MAIN_MODULE=1 so that you can support runtime loading of side modules that are not included at linktime? (this is the only valid reason for using MAIN_MODULE=1 these days). Sadly, unless you know exactly which symbols your potential side modules will be needing then you will need to be conservative and export everything :(

rth commented 3 years ago

Are you aware that createExportWrapper is only used when assertions are enabled?

Currently yes, but following the above discussion I experimented with using it also for builds without assertions in https://github.com/emscripten-core/emscripten/pull/14797 Which didn't help size wise for my use case though.

and you are building with MAIN_MODULE=1 so that you can support runtime loading of side modules that are not included at linktime?

Yes, we are indeed. Thanks for your comments!

So overall the usefulness of this is debatable. I would be happy to close this and the linked PR if you don't find it useful.

hoodmane commented 3 years ago

Did upgrading emscripten break my sed script? Or are you trying to avoid having to use it? We have been removing the C++ mangled names by postprocessing with sed -i -E 's/var __Z[^;]*;//g' build/pyodide.asm.js after linking is done.

rth commented 3 years ago

Yes, I saw that. I don't think that code changed in emscripten. I was just looking into the final produced output, maybe the regexp didn't catch some of it. In any case, it would be nice to reduce the size in the generated js in emscripten file before appling our custom filtering.

hoodmane commented 3 years ago

It still seems to be removing all of enormous mangled names like _ZNSt3__216allocator_traitsINS......, but it looks like I should try also getting rid of _orig$_Z...

emscripten-core / emscripten

Optimizing output .js size: using createExportWrapper with no assertions #14695