Open vadimkantorov opened 4 years ago
Btw some time ago TensorFlow.js did exactly that - it produced and served binary files in chunks - now I understand that this was probably done to force caching
Indeed. Short of chunking the data be less that ~50MB by chunk there isn't much emscripten can do about it. It's likely possible to increase that limit in config for a particular browser install (at least it's possible in Firefox), but that won't apply to all other users. At the same time I understand why browsers have such limit by default so asking them to increase it is likely not going to be accepted.
So I think this can likely be closed. There is a specific issue on chunking large files with file_packager.py in https://github.com/emscripten-core/emscripten/issues/12342
This problem should at least be highlighted in the docs
For wasm / data files, it may at least be worth creating an issue at chromium bugs, since it's a legitimate usecase for discussion (even if the decision is negative)
A docs PR sounds good (maybe an FAQ entry)?
I have the same problem with WASM binaries generated by Go compiler. In particular, my current client binary size is ~14 MB, but it's transferred gzipped as ~2.5 MB which is not that big after all. However, Chrome keeps reloading it on every page load, no matter what I do. It caches everything else including big JPEG, PNG, and SVG files. I have both set max-age
and immutable
(which is also not supported by Chrome) header attributes. It's apparently a bug, but from my experience, posting it is just a waste of time. I'm watching over this issue for quite some time already.
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.
I kind of update: this issue persists when we use local self-signed certificate for https/2. It seems that wasm files are cached to disk (but not in memory) as expected (?) when we use certificate that was obtained from correct CA.
@vadimkantorov did you find a solution?
The only viable solution is to implement chunking in some way for the file_packager.py (feature request here https://github.com/emscripten-core/emscripten/issues/12342, but until it's done automatically you'd need to do it by crafting file lists manually I guess). Files under 45-50Mb are cached okay by Chrome, but always make sure to test.
For super-large wasm files, I guess one will have to implement chunking and loading the module manually as well. There might be some hiccups if one wants to use streaming compilation, but maybe modern javascript can allow to implement somehow streamable arraybuffer
One thing that I noticed is that Chrome's decision to cache or not can be influenced by compression algorithm; e.g., a 40MB text file won't cache but if you transmit it in gzip form down to 200KB it wil lcache.
I have a primitive http server that sets ETag for all files, including html/js/data/wasm. However, Chrome seems to not want to cache any file larger than a few dozen megabytes. This leads to reloading large data files, and this is slow-ish even on localhost.
My question on SO about this: https://stackoverflow.com/questions/63891436/chrome-refuses-to-cache-binary-data-files, code:
One way forward may be to force
file_packager
to generate chunks of 10Mb and have them load in parallel (same for large wasm files).Maybe related: https://stackoverflow.com/questions/60646737/how-to-cache-large-fie-in-chrome
Related: https://github.com/emscripten-core/emscripten/issues/4711. Maybe worth providing an option for IndexedDB caching for wasm in the meanwhile...
The problem with .wasm is also very acute, since it's happening at every module reload. Would you have an advice on dumping/reloading heap / other state? So that I could try to reset the module (at least softly)