emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.7k stars 3.3k forks source link

Replace ENVIRONMENT_* logic with Web default + polyfills? #12203

Open kripken opened 4 years ago

kripken commented 4 years ago

Right now we support various "environments": web, node, other JS shells, etc. We need to use different code paths for loading a file, for example - fetch on the web versus node's fs package, and so forth. We have a layer of indirection for this, where in shell.js the environment is detected and hooks like readBinary, readAsync are set up.

This has two current downsides. First, that while you can specify that only a single target environment is supported, the detection + indirection layer may still end up increasing code size. Second, we get requests for more environments to be supported now and then, like electron variants or deno, and they require adding an entire new "environment".

Instead, what if we made the following change:

Benefits:

curiousdannii commented 4 years ago

Great idea. But a few thoughts.

Traditionally polyfills have aimed to completely implement the relevant standard by implementing it as a global variable or by modifying a built in object. So a polyfill for fetch would need to completely implement the full spec (because if we write to global objects other code may try to use it) . I think that would be very inoptimal. We only care about a small subset of the full specification. So rather than talking in terms of polyfills I think it would be better to think of a small Emscripten Environment API. Such an API like that would only have the things Emscripten itself uses. So it wouldn't need to handle all the options of fetch, but just the few options we need.

Can I suggest that each env file pushes a function to an array, and then at the appropriate time the shell code will run the functions in series. Each function should check if a component in the API has already been supplied, and if not it checks to see if it can supply it. The function should be an async function - this will allow us to fix #11792.

My last thought is whether the default should be for modern browsers, or whether it should be for a more minimalistic and agnostic environment that might support the core ES standard but not much else. For example for many people Math.random might be adequate randomness, and the extra code for using the Crypto API might not be worth it (probably not a good example as I don't think it actually uses too much code.) On the other hand, as you said, file size is mostly only a concern for web, not other environments.

I'm not at my dev system now, but I'll take a look later to try to put together a list of functions this environment API would need.

sbc100 commented 4 years ago

I agree we your point that we don't want a full polyfil. However, I think there are good reasons not to invent a new wrapper API but instead to subset the Web API.

This has the effect that no polyfil at all is required for our main target platform, and also helps keep code size minimal when targeting the web where we care a lot about code size.

kripken commented 4 years ago

I agree "polyfill" is probably the wrong word, good point @curiousdannii . Yes, we just want a a small subset of fetch etc. in this case. Just enough to implement the usage we actually make of fetch.

But I do think the API we use should be the actual fetch. That does make it a little more awkward to write the implementations for non-Web environments, as they need to work to implement a subset of fetch as opposed to a new clean API. But the code size benefits are worth it I think as @sbc100 said.

I don't think it would be that hard to implement these APIs. But a prototype is necessary to be sure.

I didn't understand

The function should be an async function - this will allow us to fix #11792.

I'd guess we need the entire module to be async - why is there a specific issue for the ENVIRONMENT logic?

curiousdannii commented 4 years ago

However, I think there are good reasons not to invent a new wrapper API but instead to subset the Web API.

@sbc100 Not sure what you mean by this - just to keep the relevant part of the API the same? That's reasonable, especially as most recent APIs are simpler than the older ones (fetch vs XMLHttpRequest).

But I do think the API we use should be the actual fetch.

@kripken Well in this case the fetch API is the simplest an API could possibly be, you pass a URL and nothing else to a function! I was just suggesting that if any of the other browser APIs we use is more complex, we could consider abstracting it in this new system.

An example where I think making our own API would make sense is that rather than mocking performance.now, we could have one function that returns the timestamp, and set the resolution and whether or not its monotonic as properties of that function. Everything we care about is then neatly encapsulated.

I'd guess we need the entire module to be async - why is there a specific issue for the ENVIRONMENT logic?

We don't need to start making the whole of Emscripten's output async, but if we can put this environment stuff into an async addRunDependency or the like, then it can solve our Node ES modules issues. Those issues being that the import function is async.

curiousdannii commented 4 years ago

Okay, all the things currently gated behind ENVIRONMENT checks (except for PTHREAD):

kripken commented 4 years ago

@curiousdannii

Thanks for the writeup!

Overall I think that looks good. I think most work could be for fetch and XMLHttpRequest. Otherwise I am a little unsure about node-only things like NODE: handles system calls with child_process - without a Web API to fill in, we may just want to create a new API for those. But I think for most using a Web API would be feasible and best.

deterministic.js - NODE: overwrites process.hrtime?? Very weird.

The DETERMINISTIC flag does that, yeah - it removes all timing (Date.now, etc.) so every run of the program returns the same result. Can be useful for debugging, and probably nothing else.

We don't need to start making the whole of Emscripten's output async, but if we can put this environment stuff into an async addRunDependency or the like, then it can solve our Node ES modules issues. Those issues being that the import function is async.

I think we can do that in the approach suggested here. Specifically something like

// impl of some API
var internalImplDetail; // filled in from an async import
function impl() {
  internalImplDetail(); // assumes it was filled in
}

addRunDependency("node.js async imports");

// Just a sketch, I have no idea what this looks like in reality!
waitForNodeAsyncImport().then((imports) => {
  internalImplDetail = imports.someDetail; // fill it in now
  removeRunDependency("node.js async imports");
});
curiousdannii commented 4 years ago

Would that be the contents of env/node.js? Would that be okay to just copy into the source, it wouldn't need to be wrapped? If the run dependency system will wait for the dependency to be finished I guess it would.

kripken commented 4 years ago

@curiousdannii yes, exactly, something like that would go in env/node.js.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

curiousdannii commented 3 years ago

Bump

kripken commented 1 year ago

This approach may eventually be very simple to implement, as the environments keep converging:

In that case maybe we'd just write all code for the web, and have a setting "also support node" which would add sync fs stuff etc. for Node/Deno/Bun.

Sec-ant commented 9 months ago

Bun has evolved into 1.0, a gentle bump.