emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.88k stars 3.32k forks source link

Cross Origin pthread worker #21937

Open msqr1 opened 6 months ago

msqr1 commented 6 months ago

I love the new inline worker change, but now we can't never use a 3rd-party CDN to serve those files again, because Worker construction requires the file to be in the same origin. Previously, we can just host the very small worker file on our server, while serving the heavy main file through a CDN, but now we can't anymore. Is there any way we could address this problem?

sbc100 commented 6 months ago

Can you explain why this doesn't work? Why cant you serve the main JS file from the CDN?

We do have setting called mainScriptUrlOrBlob which controls where the main JS file finds itself. I'm curious why this would be needed though since I would expect that loading JS file from the same location from which the main thread already loaded it would make the most sense and result in the best caching behaviour. In fact, shouldn't the browser already have the main JS file cached, resulting in no extra JS fetches per-thread?

msqr1 commented 6 months ago

I know it is cached, but it is still from cross origin. The thing that matter is the origin, not cached or not.

sbc100 commented 6 months ago

Sorry I didn't read the title there.

I'm not too familiar with cross-origin limitations. If you, or anyone else, has any ideas on how to make this work better that would be most welcome.

sbc100 commented 6 months ago

So, to be clear, the security policy disassllows new Worker with a cross origin script but then allows the new worker to call importScripts from other origins?

sbc100 commented 6 months ago

Trying to understand the issue, the only mention I see about same-origin restrictions in workers looks like it related to subworkers: https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers. Are you using subworkers? Or is this some kind of additional non-default policy that you have setup?

sbc100 commented 6 months ago

Hmm ... I wonder if we could do something line new Worker(blobUrl("importScripts(...)" to work around this?

Maybe you could even try this out by adding something like this to your pre-js file:

Module['mainScriptUrlOrBlob'] = `importScripts(${document.currentScript})`;

?

sbc100 commented 6 months ago

I tried this locally and the following --pre-js seemed to work:

if (!ENVIRONMENT_IS_PTHREAD) {
  Module['mainScriptUrlOrBlob'] = new Blob([`importScripts('${document.currentScript.src}');`])
}
msqr1 commented 6 months ago

Did you try to serve that file through a cdn?

sbc100 commented 6 months ago

Nope, I'm just going by the fact that importScripts from your CDN worked for you in the past. So this is way to effectively revert to the prior behaviour.

msqr1 commented 6 months ago

Yes, importscript in the worker seems to work fine. We should integrate this fix because I'm sure many people will use a cdn

sbc100 commented 6 months ago

I'm confused by https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers though. It seems to suggest that only sub-workers have this requirement by default? So by default, using a CDN like this should work?

sbc100 commented 6 months ago

Can you confirm that the --pre-js fix works for you your use case?

msqr1 commented 6 months ago

I know it's weird, but see this: https://stackoverflow.com/questions/58098143/why-are-cross-origin-workers-blocked-and-why-is-the-workaround-ok

Basically the default mode for worker URL fetching is same-origin, so a cross origin URL will fail. Importscript default is just a normal fetch, so it works

sbc100 commented 6 months ago

So it sounds like maybe MDN is just wrong on this a not up-to-date with the spec, since it only mentions the same-origin requirement for sub-workers.

sbc100 commented 6 months ago

Spec: https://html.spec.whatwg.org/multipage/webappapis.html#fetch-a-classic-worker-script

msqr1 commented 6 months ago

Yeah it does, probably should report this to them.

msqr1 commented 6 months ago

Can you confirm that the --pre-js fix works for you your use case?

It doesn't, I'm using modularize, so in the module callback document.currentScript is null. I used the _scriptName global variable that keeps a reference to document.currentScript.src instead. It works. Normally, without modularize, it is available, but with that on, it won't.

sbc100 commented 6 months ago

Can you share the full fix you came up with here?

msqr1 commented 6 months ago

I took yours and replace document.currentScript.src with _scriptName for modularize mode, not sure about non-modularization

 if (!ENVIRONMENT_IS_PTHREAD) {
   Module['mainScriptUrlOrBlob'] = new Blob([`importScripts('${_scriptName}');`])
 }
sbc100 commented 6 months ago

Does that work? Looking at the code it seems that _scriptName is set only after the --pre-js code is included.

sbc100 commented 6 months ago

(We could potentially move the _scriptName assignment earlier).

msqr1 commented 6 months ago

Wait, I thought the _scriptName is declared before the iife is returned to be used in the function?

sbc100 commented 6 months ago

Wait, I thought the _scriptName is declared before the iife is returned to be used in the function?

I think MODULARIZE mode differs in this respect and injects _scriptName very early on.

msqr1 commented 6 months ago

PR?

sbc100 commented 6 months ago

I'd like to better understand the problem, and decide if we should do this default or make it a new option. I also wonder if there is any downside to doing in the case when things are hosted on the origin. For example, does it make debugging hard / more opaque when the URL passed to new Workers is a blob that then called importScripts vs just a URL?

For now, is it OK if you use your workaround?

msqr1 commented 6 months ago

I'm happy with my fix, but I think this should be the default if we decide to go with it. People wont expect this bug when they switch to a new version that keeps them from using a CDN. Also: image We can move capturing it to before the pre.js if possible

msqr1 commented 6 months ago

To be fair MDN kinda said it here: https://developer.mozilla.org/en-US/docs/Web/API/Worker/Worker that the URL should be same origin as the host

One more thing, if we decide to go with this, don't forget to set Blob's MIME type to text/javascript, otherwise it won't work on Firefox: https://developer.mozilla.org/en-US/docs/Web/API/Worker/Worker#:~:text=Strict%20MIME%20type%20checks%20for%20worker%20scripts

sbc100 commented 6 months ago

@msqr1 I'm curious, are there some CDNs (e.g couldflare and fastly type things) that don't require you to do cross-origin stuff? i.e. they take over your whole domain, so to speak? I could be misunderstanding but I wonder how popular those two different types of solution are?

msqr1 commented 6 months ago

There is no such CDN service that I know of. But I think Website hosting service does exactly what you say, they have servers everywhere that basically acts as CDN. Those servers serve your main page, your assets and everything else you have. You can basically treat it as your own domain while still utilizing their "CDN" to serve files fast. I don't think it would make sense for a CDN to take over your domain, because what CDN does is it helps you minimize distance by utilizing their numerous servers available around the globe (because you only have one, a small, or a few server).

msqr1 commented 6 months ago

I also wonder if there is any downside to doing in the case when things are hosted on the origin.

@sbc100, CSP needs to be set for worker-src or default-src to blob for the workaround to spawn the worker. But at this point, why don't we just spawn the whole worker content straight from the blob already? Why importScript if we're using blob anyway?

In my opinion, if we choose to do this, I would make a new setting like CROSS_ORIGIN_PTHREAD that uses the new approach of spawning from the blob.

sbc100 commented 6 months ago

I also wonder if there is any downside to doing in the case when things are hosted on the origin.

@sbc100, CSP needs to be set for worker-src or default-src to blob for the workaround to spawn the worker. But at this point, why don't we just spawn the whole worker content straight from the blob already? Why importScript if we're using blob anyway?

In my opinion, if we choose to do this, I would make a new setting like CROSS_ORIGIN_PTHREAD that uses the new approach of spawning from the blob.

It seems reasonable to add a new option for this yes. Feel tree to send a patch. BTW, I would call the option PTHREAD_CROSSORIGIN so that is groups with the other PTHREAD-prefixed settings.

I'm also curious to see what proportion of folks are affected by this.

sbc100 commented 6 months ago

If the workaround requires worker-src or default-src already then why not use the importScript method? One advantage to this is debug-ability since the worker then knows the source of its source, as it were. Also, I can imagine that the importScripts methods is much for friendly to the browser cache, and maybe even the v8 code case, and could be faster for that reason.

msqr1 commented 6 months ago

You're right, I forgot about the cache

DylanShang commented 1 month ago

I have encountered the same problem, how is the progress now? I think it would be better if there is a PTHREAD_CROSS_ORIGIN option.

DylanShang commented 1 month ago

Another thing is when I set ENVIRONMENT='web,webview,worker', Module['mainScriptUrlOrBlob'] will not appear in the script, it will appear until I add node enviroment. Maybe we need to think about this case.

msqr1 commented 1 month ago

I think you can try adding 'mainScriptUrlOrBlob' onto INCOMING_JS_MODULE_API. But I agree with you. We need an actual fix instead of a workaround.