emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.64k stars 3.29k forks source link

missingGlobal() and missingLibrarySymbol() pollute symbols into the global JS scope in MODULARIZE builds and prevent unloading Emscripten content #22588

Open juj opened 11 hours ago

juj commented 11 hours ago

At Unity we support a feature to unload the game engine from a web page. This enables sites to hot-swap between multiple Unity builds on a single web page.

To enable seamless integration with site content, this unloading support has been requested to be possible without needing to use iframes, and we build it on top of Emscripten's -sMODULARIZE feature.

It looks like since our last Emscripten update, this feature has regressed in debug builds, due to the use of the globalThis object in runtime_debug.js.

The functions missingGlobal() and missingLibrarySymbol() both call to Object.defineProperty(globalThis, ...);, which has the effect of leaking symbol definitions into the global scope of the JS context, even when the code is being built with -sMODULARIZE=1 where symbol definitions would be expected to be local to the MODULARIZEd function.

The defined functions contain closures that hold on to the loaded Wasm page.

Then even if one deletes all the traces of the loaded Module, these polluted global symbols have function scopes that keep the page alive.

Here is an example:

shell_unload.html

<html><body>
<textarea rows="8"></textarea>
<input type="button" value="Unload" onclick="unload();">
<script>
  var script = document.createElement('script');
  script.src = "a.js"; // would like to be {{{ TARGET_JS_NAME }}} but that doesn't work in default runtime.
  script.onload = () => {
    var code = SpecialUnityExportedModuleName;
    delete window.SpecialUnityExportedModuleName;
    code({ print: function(...args) { document.querySelector('textarea').value += `${args.join(' ')}\n`; } });
  }
  document.body.appendChild(script);
  function unload() { // Unload button unloads the Module from the page
    script.remove();
    delete window.script;
    document.querySelector('textarea').disabled = document.querySelector('input').disabled = 'true';
    document.querySelector('input').value = 'Unloaded';
  }
  // Must exist for technical purposes even if not used: {{{ SCRIPT }}}
</script></body></html>
emcc test\hello_world.c -o a.html -sINITIAL_MEMORY=2GB -sMODULARIZE -sEXPORT_NAME=SpecialUnityExportedModuleName --shell-file shell_unload.html

Then run the page, click on Unload and take a Chrome heap snapshot:

image

This snapshot shows that the Wasm program is still in memory:

image

and it is being held in memory by functions missingGlobal() and missingLibrarySymbol().

It is noted that this appears only in debug builds, although in our case, we provide developers debug tooling especially in debug builds so that they would be able to examine how this plays out. So we would prefer for this symbol pollution to not occur in debug builds either.

I am not sure however what would be a good way to rearchitect unexportedRuntimeSymbol() and missingGlobal() to fix the leakage, except to have the compiler python script statically list out the missing runtime symbols. Maybe that is what it'd take, if no better ideas?

sbc100 commented 9 hours ago

Yup, I agree this seems wrong.

Its a shame we can't attached these symbols to the "current context", since JS has no way to access that.

We could consider simply dropping these features (at least in MODULARIZE mode), or we could emit these functions as regular symbols in the namespace.