emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.67k stars 3.29k forks source link

webpack/bundler Fix: Update worker JS file import URL to be compatible with bundlers #22165

Closed Twinklebear closed 3 months ago

Twinklebear commented 3 months ago

This PR closes #22140 by correcting the URL used to import the worker JS in library_pthread.js::allocateUnusedWorker. Previously the URL would be:

worker = new Worker(new URL(import.meta.url))

which webpack translates to thefile:/// path, causing the import to fail b/c the file:// path is not valid to access (webpack docs import meta). For example, from my own app that I hit this issue on:

wgpu_app.js:1050 Uncaught (in promise) DOMException: Failed to construct 'Worker': Script at 'file:///Users/will/repos/webgpu-cpp-wasm/web/src/cpp/wgpu_app.js' cannot be accessed from origin 'http://localhost:8080'.
    at Object.allocateUnusedWorker (http://localhost:8080/6009193530ba34a86bac.js:4590:14)
    at Object.initMainThread (http://localhost:8080/6009193530ba34a86bac.js:4487:15)
    at Object.init (http://localhost:8080/6009193530ba34a86bac.js:4481:15)
    at http://localhost:8080/6009193530ba34a86bac.js:12723:9
    at http://localhost:8080/6009193530ba34a86bac.js:982:81

The correct URL for bundlers is:

worker = new Worker(new URL('{{{ TARGET_JS_NAME }}}', import.meta.url));

Which webpack will understand and properly translate to the bundled path. Webpack will also automatically generate a new entry point for the worker file when bundling (docs).

I've updated the webpack ES6 test added in https://github.com/emscripten-core/emscripten/pull/22142 to exercise this code path by building w/ pthreads and adding -sPTHREAD_POOL_SIZE=1 to cause an unallocated worker to be created. Let me know if it's preferred to add a new separate webpack + es6 + threads test instead of just adding the threading flags on to the existing webpack es6 test.

Without the fix to the worker import URL added in this PR the test would fail with the error message below after enabling threads and starting one. After the change to the worker import URL, this test passes again.

test_webpack_es6 (test_browser.browser) ... Opening in existing browser session.
asset main.js 101 KiB [emitted] (name: main)
asset 505f5fe022ebe2b256f9.wasm 38.7 KiB [emitted] [immutable] [from: src/hello.wasm] (auxiliary name: main)
runtime modules 2.22 KiB 6 modules
cacheable modules 94.9 KiB (javascript) 38.7 KiB (asset)
  ./src/index.mjs 404 bytes [built] [code generated]
  ./src/hello.mjs 94.4 KiB [built] [code generated]
  ./src/hello.wasm 42 bytes (javascript) 38.7 KiB (asset) [built] [code generated]
webpack 5.92.1 compiled successfully in 96 ms
[test error (see below), automatically retrying]
Expected to find '/report_result?exit:0
' in '/report_result?exception:Failed to construct 'Worker': Script at 'file:///Users/will/repos/emscripten/out/test/webpack/src/hello.mjs' cannot be accessed from origin 'http://localhost:8888'. / Error: Failed to construct 'Worker': Script at 'file:///Users/will/repos/emscripten/out/test/webpack/src/hello.mjs' cannot be accessed from origin 'http://localhost:8888'.    at Object.allocateUnusedWorker (http://localhost:8888/webpack/dist/main.js:1625:18)    at Object.initMainThread (http://localhost:888[..]
', diff:

--- expected
+++ actual
@@ -1 +1 @@
-/report_result?exit:0
+/report_result?exception:Failed to construct 'Worker': Script at 'file:///Users/will/repos/emscripten/out/test/webpack/src/hello.mjs' cannot be accessed from origin 'http://localhost:8888'. / Error: Failed to construct 'Worker': Script at 'file:///Users/will/repos/emscripten/out/test/webpack/src/hello.mjs' cannot be accessed from origin 'http://localhost:8888'.    at Object.allocateUnusedWorker (http://localhost:8888/webpack/dist/main.js:1625:18)    at Object.initMainThread (http://localhost:88[..]

FAIL
[Browser harness server terminated]

After updating library_pthread.js::allocateUnusedWorker() this test passes again:

test_webpack_es6 (test_browser.browser) ... Opening in existing browser session.
asset main.js 101 KiB [emitted] (name: main)
asset src_hello_mjs.js 101 KiB [emitted]
asset 505f5fe022ebe2b256f9.wasm 38.7 KiB [emitted] [immutable] [from: src/hello.wasm] (auxiliary name: main)
runtime modules 4.59 KiB 14 modules
cacheable modules 94.9 KiB (javascript) 38.7 KiB (asset)
  ./src/index.mjs 404 bytes [built] [code generated]
  ./src/hello.mjs 94.5 KiB [built] [code generated]
  ./src/hello.wasm 42 bytes (javascript) 38.7 KiB (asset) [built] [code generated]

WARNING in [entry] [initial]
Circular dependency between chunks with runtime (main, src_hello_mjs)
This prevents using hashes of each other and should be avoided.

webpack 5.92.1 compiled with 1 warning in 91 ms
ok
[Browser harness server terminated]

----------------------------------------------------------------------
Ran 1 test in 0.996s

OK

I tried adding -pthread -sPTHREAD_POOL_SIZE=1 to the non-es6 webpack test path as well but this fails with an error about the document not being defined. So I've left the non-es6 test alone, and just the es6 path will test threads support. I tried just adding some if (document) here but still got the error about the document not being defined, so I wasn't sure quite what to change here to make this run with threads (or if that was even needed).

Expected to find '/report_result?exit:0
' in '/report_result?exception:Uncaught ReferenceError: document is not defined / undefined
', diff:

--- expected
+++ actual
@@ -1 +1 @@
-/report_result?exit:0
+/report_result?exception:Uncaught ReferenceError: document is not defined / undefined

FAIL
[Browser harness server terminated]
sbc100 commented 3 months ago

Can you update the title so that its clear this is webpack/bundler fix. To be clear there no issue with existing syntax other than bundlers seem to be confused by it.

Also, could you add a comment in the code saying that this is why need to use the longer form?

Twinklebear commented 3 months ago

Sounds good, I've updated the title and added a note explaining why we use this worse form of creating the file URL

andreamancuso commented 3 months ago

I'm also impacted by this - looking forward to this PR getting merged.

RReverser commented 2 months ago

do you think we can/should file this a bug on the webpack side.

I don't think so, that's a specific pattern recognised by all bundlers.

I'm also surprised, I was pretty sure we already emitted a coorect form back when I added it and even had tests to ensure it's this specific pattern that gets emitted in the output? Did that change at some point?

sbc100 commented 2 months ago

Yes, I think it changed with the removal of worker.js file in #21701

sbc100 commented 2 months ago

Yes, I think it changed with the removal of worker.js file in #21701

What happened in this change is that we went from creating a worker using a filename that was different to the current file (i.e. worker.js) to creating a worker using the exact same filename... so it made sense to me that we could transition to using the shorter/simpler form of new Worker that just uses import.meta.url. Its really shame we cannot use that form since the explicit name is more fragile.

RReverser commented 2 months ago

Ah I see. Yeah I get the distaste for an explicit name, but to be fair that's already the case for other assets like Wasm itself, so doesn't seem too problematic on its own. One way or another you need to do find-replace when renaming those files.

fs-eire commented 2 months ago

This change breaks my code which try to put everything into a single file.

Please consider this example:

// main.js. Will build into bundle `my-output.js`
// Emscripten generates `a.wasm` and `a.js`.

import wasmFactory from './a.js';

export const myFunc = async (param) => {
  const instance = await wasmFactory();
  instance.func(param);
};

Without this change, it is able to build the code above into a single file my-output.js, which will create PThread worker threads (import.meta.url will resolve to the path to my-output.js). However now this is impossible because a hard-coded file name is written in a.js.

I understand the problem that may affect webpack users, but this is a feature regression to me if I upgrade from 3.1.62 to 3.1.63. It would be great if a setting is added to allow reverting back to 3.1.62 behavior.

@sbc100

fs-eire commented 2 months ago

Thanks so much for working on this!

I do think its a bit of a shame that we have to make the code strictly worse (IMHO) because the bundlers can't handle the shorter form. But I it is an important use case.

@RReverser do you think we can/should file this a bug on the webpack side.

I agree with the "its a bit of a shame that we have to make the code strictly worse" part. For whatever reason, the current behavior on import.meta of webpack breaks JavaScript code which tries to evaluate script URL at runtime when using ESM. Some webpack users are aware of this issue and there is a workaround to disable the webpack's default behavior of rewriting import.meta.url by adding the following to webpack.config.js:

module : {parser : {javascript : {importMeta : false}}},

It was a very elegant solution to use new Worker(new URL(import.meta.url) because technically this allows an ESModule to be loaded correctly by both static import statement and dynamic import() when made into bundle.

RReverser commented 2 months ago

which will create PThread worker threads (import.meta.url will resolve to the path to my-output.js).

Hm I'd say you shouldn't be relying on import.meta.url resulting in pointing to any file except your own. That was UB to begin with.

marcosc90 commented 2 months ago

This change broke vite builds with

error during build:
[commonjs--resolver] Circular worker imports detected. Vite does not support it

see: https://github.com/vitejs/vite/pull/16103

fs-eire commented 2 months ago

which will create PThread worker threads (import.meta.url will resolve to the path to my-output.js).

Hm I'd say you shouldn't be relying on import.meta.url resulting in pointing to any file except your own. That was UB to begin with.

I expect import.meta.url to be the runtime URL of the current script, which is described in https://html.spec.whatwg.org/multipage/webappapis.html#hostgetimportmetaproperties. The example (my-output.js) is supposed to be generated by a build process that well defined in the project. I don't understand why you think there is any undefined behavior.

fs-eire commented 2 months ago

There are other places in the generated JS code which uses import.meta.url. For example, the following line:

  var _scriptName = import.meta.url;

This will be rewritten into a static file://-like string by webpack by default, so functions that uses _scriptName will not work correctly (locateFile()).

The real problem is, Webpack's default behavior does not allow runtime evaluation of import.meta.url. However, Emscripten generated JS may need the capability to get the current script URL at runtime for some feature.

Twinklebear commented 2 months ago

This change broke vite builds with

error during build:
[commonjs--resolver] Circular worker imports detected. Vite does not support it

see: vitejs/vite#16103

While the build is broken here, I wonder if it's actually correct for it to be broken and that maybe Vite needs to support the main JS file referencing itself to make a Worker? Because Emscripten's new Worker call is a circular reference even before this change. The original URL:

worker = new Worker(new URL(import.meta.url))

and

worker = new Worker(new URL('{{{ TARGET_JS_NAME }}}', import.meta.url));

Technically do refer to the same file, so in both cases it's a circular reference. It's just that the bundler doesn't recognize/check the first case and lets it through. Vite's docs do mention the latter case that this PR added: https://vitejs.dev/guide/assets#new-url-url-import-meta-url . That PR mentions Vite supporting self referencing worker files, maybe it needs to extend support to allow a non-worker file to reference itself to create a worker?

sbc100 commented 2 months ago

This change breaks my code which try to put everything into a single file.

Please consider this example:

// main.js. Will build into bundle `my-output.js`
// Emscripten generates `a.wasm` and `a.js`.

import wasmFactory from './a.js';

export const myFunc = async (param) => {
  const instance = await wasmFactory();
  instance.func(param);
};

Without this change, it is able to build the code above into a single file my-output.js, which will create PThread worker threads (import.meta.url will resolve to the path to my-output.js). However now this is impossible because a hard-coded file name is written in a.js.

I understand the problem that may affect webpack users, but this is a feature regression to me if I upgrade from 3.1.62 to 3.1.63. It would be great if a setting is added to allow reverting back to 3.1.62 behavior.

@sbc100

I'm not sure that use case you describe will work since wouldn't that result in my-output.js being loaded on each pthread instead of just the a.js code (which is what is required).

e.g. if you added console.log to your main.js that should not get run on each worker startup right? But I think if you using mport.meta.url (a.k.a my-output.js) as your pthread startup file then it would be run on each thread, no?

dwayneparton commented 2 months ago

This change broke vite builds with

error during build:
[commonjs--resolver] Circular worker imports detected. Vite does not support it

see: vitejs/vite#16103

@brianpmaher and I have been trying to find a work around to support both vite and webpack for an internal library. We were able to bypass the circular deps error by replacing:

worker = new Worker(new URL('{{{ TARGET_JS_NAME }}}', import.meta.url));

with

var _workerUrl=new URL('{{{ TARGET_JS_NAME }}}', import.meta.url);
worker=new Worker(_workerUrl, workerOptions);

Maybe not the best solution but thought I'd share, hopefully that helps someone.

sbc100 commented 2 months ago

@brianpmaher and I have been trying to find a work around to support both vite and webpack for an internal library. We were able to bypass the circular deps error by replacing:

Just to be clear, you are not actually removing the circularity here? (I think that is not actually possible). This somehow suppresses the warning but making the circularity obscure the algorithms checking for it?

dwayneparton commented 2 months ago

@sbc100 Correct. It's not removing it, more exploiting a hole in the algorithm vite uses to check.

A couple more details: Here's what we were seeing

// does not work with vite but does with webpack
worker = new Worker(new URL('{{{ TARGET_JS_NAME }}}', import.meta.url));

// works with vite but not webpack
worker = new Worker(new URL(import.meta.url));

// works with vite and webpack
var _workerUrl=new URL('{{{ TARGET_JS_NAME }}}', import.meta.url);
worker=new Worker(_workerUrl, workerOptions);
jamsinclair commented 1 month ago

@dwayneparton Which version of vite are you using? From versions v5.2.2 simple self-referential circular dependencies can be handled when using module workers (https://github.com/vitejs/vite/pull/16103).

The following config may need to be added to your vite.config.js file.

worker: {
    format: "es"
}

I think you would also need to update emscripten to use the flag -s EXPORT_ES6=1 so module workers are used (please correct me if I'm wrong here).

However, I have noticed in recent versions of emscripten the switch to storing workerOptions in a variable has broken Vite builds. I aim to create an issue and reproduction soon 😅 (Edit: the issue is #22394)

dwayneparton commented 1 month ago

@jamsinclair We're are using vite v5.3.1 with the worker.format = "es".

I think you would also need to update emscripten to use the flag -s EXPORT_ES6=1 so module workers are used (please correct me if I'm wrong here).

We also have the EXPORT_ES6 flag but maybe there is more there. @brianpmaher

The main issue we had was getting it to work in both Vite and Webpack w/o any treatment to the worker url.