Open rth opened 3 years ago
I think you're right. That would be a good approach.
The var x =
part is hard to avoid (at least without eval...), but all the rest can be unified.
There may be a slight downside in that more code is run during startup, but I'm pretty sure it would not matter in practice, and the smaller download might more than offset it.
The only other possible downside I can think of is that stack traces would look slightly less clear. But that part of the stack trace should not really matter.
The only other possible downside I can think of is that stack traces would look slightly less clear. But that part of the stack trace should not really matter.
Also we are talking about release builds here which likely already have less useful stack traces. When ASSERTIONS is enabled we already use a wrapper so this won't effect debug builds I think?
@sbc100 Good point, yes. Stack traces should be fine then.
Thanks for the feedback! I'll make a PR then.
I did some refactoring in https://github.com/emscripten-core/emscripten/compare/main...rth:use-export-wrapper, assuming that it's OK to use globalThis
instead of var
for defining globals. The issue is that it turns out this has very little impact on the size in the case of pyodide.asm.js
. Other things that I could try are,
make_export_wrappers
(in emscripten.py) inside createExportWrapper
(in src/preamble.js). So that,
createExportWrapper('a', '_a');
createExportWrapper('b', '_b');
// repeated 10k times in my case
becomes
createExportWrapper({a: '_a', b: '_b', ...})
though that would make stack traces harder to read indeed.
'_' + name
for mangled value in createExportWrapper
to avoid repeating the name, so that
createExportWrapper('a', '_a');
becomes
createExportWrapper('a');
(or createExportWrapper({a: undefined})
in the vectorized case). This would add more compute at initialization time and maybe add slightly more complexity.
Overall I wonder if this change is worth it. In the end, the issue in my case is that there are symbol names like,
createExportWrapper(
"_ZNSt3__216allocator_traitsINS_9allocatorINS_17basic_string_viewIcNS_11char_traitsIcEEEEEEE10__max_sizeENS_17integral_constantIbLb1EEERKS6_",
"__ZNSt3__216allocator_traitsINS_9allocatorINS_17basic_string_viewIcNS_11char_traitsIcEEEEEEE10__max_sizeENS_17integral_constantIbLb1EEERKS6_"
);
and maybe looking into why they need to be exported and whether there is a way to shorten the name would make more sense.
I can open the above linked PR if you find it useful though.
@rth, thanks for investigating this.
Are you aware that createExportWrapper
is only used when assertions are enabled? In release builds this functions not included and never used. So optimizing its usage, while maybe good for readability, is not really going to save any size in production builds.
Regarding the exports of things like _ZNSt3__216allocator_traitsINS_9allocatorINS_17basic_string_viewIcNS_11char_traitsIcEEEEEEE10__max_sizeENS_17integral_constantIbLb1EEERKS6
... do you know why this is bring exported in your case? Are you building with -sMAIN_MODULE=1
(in which case I recommend switching to MAIN_MODULE=2
) or maybe -sLINKABLE
?
Oh wait.. I think I remember your use case and you are building with MAIN_MODULE=1
so that you can support runtime loading of side modules that are not included at linktime? (this is the only valid reason for using MAIN_MODULE=1
these days). Sadly, unless you know exactly which symbols your potential side modules will be needing then you will need to be conservative and export everything :(
Are you aware that createExportWrapper is only used when assertions are enabled?
Currently yes, but following the above discussion I experimented with using it also for builds without assertions in https://github.com/emscripten-core/emscripten/pull/14797 Which didn't help size wise for my use case though.
and you are building with MAIN_MODULE=1 so that you can support runtime loading of side modules that are not included at linktime?
Yes, we are indeed. Thanks for your comments!
So overall the usefulness of this is debatable. I would be happy to close this and the linked PR if you don't find it useful.
Did upgrading emscripten break my sed script? Or are you trying to avoid having to use it? We have been removing the C++ mangled names by postprocessing with sed -i -E 's/var __Z[^;]*;//g' build/pyodide.asm.js
after linking is done.
Yes, I saw that. I don't think that code changed in emscripten. I was just looking into the final produced output, maybe the regexp didn't catch some of it. In any case, it would be nice to reduce the size in the generated js in emscripten file before appling our custom filtering.
It still seems to be removing all of enormous mangled names like _ZNSt3__216allocator_traitsINS......
, but it looks like I should try also getting rid of _orig$_Z...
While looking at produced .js files for Pyodide (pyodide.asm.js, 1.78 MB total, 293kB compressed) a lot of the size seems to be due to this function definition for each symbol, https://github.com/emscripten-core/emscripten/blob/e77adf8a3a79e79b666dce6d7ad0d6d3b9e94b20/emscripten.py#L671-L674 in part because the mangled name can be quite long and it's repeated 4 times there.
Would it make sense to use an equivalent of
createExportWrapper
instead as done a few lines above? https://github.com/emscripten-core/emscripten/blob/e77adf8a3a79e79b666dce6d7ad0d6d3b9e94b20/emscripten.py#L659-L661or even
where
createExportWrapper
is currently defined only when assertions = true. I'm not saying to usecreateExportWrapper
exactly but an equivalent simpler function with no assertions. Unless there is a reason the current solution is preferable? Thanks!