Y2Z / monolith

⬛️ CLI tool for saving complete web pages as a single HTML file
https://crates.io/crates/monolith
Creative Commons Zero v1.0 Universal
10.92k stars 315 forks source link

JS module imports not captured #363

Open FreeMasen opened 6 months ago

FreeMasen commented 6 months ago

The following JS will fail when saved for additional network access attempts

import * from "./module.js"

with the following error when loaded

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at file:///Users/rfm/module.js. (Reason: CORS request not http).
snshn commented 6 months ago

Yeah, that's the problem with lazy-loaded files. You could try using Chrome for these types of sites, as described in one of the sections of README. Processing JS and resolving dynamically loaded files is something that happens when the browser opens the page, that's the main cause of this problem. You could try using the -b option to point the browser at the location of module.js... that could work.

FreeMasen commented 6 months ago

Would you be willing to add some level of JS parsing to this project? I actually maintain a JS parser that could be used here, I would be interested in integrating it if you would be willing to accept a PR like that?

snshn commented 5 months ago

It's so hard to tell right now... I need to finish some urgent things first (like MHTML support, crawling/recursion, async asset retrieval mechanism for speed, CORS...). Browsers already have JS engines in them, and I hoped they'd be more interested in implementing what Monolith does, rendering it obsolete, but it's been like 5 years, and here we are, still not being able to easily save page as one file using just the browser. After the JS is in, I'll end up having to add WebASM, and then something else, and something else, and essentially reinventing the wheel just because browsers don't do the thing they should be able to. I want Monolith to do awesome things, and as much as I hate the fact that JS on 99% of sites today is non-unobtrusive, the reality is usually not the way we want it to be, so to have a tool that just saves pages as one file, some kind of JS support is pretty much a must-have. I'd say it should be on the road map, perhaps as an optional switch (possibly even leveraging locally-installed browsers to do that), but I can't promise anything short-term.