Y2Z / monolith

⬛️ CLI tool for saving complete web pages as a single HTML file
https://crates.io/crates/monolith
Creative Commons Zero v1.0 Universal
10.94k stars 317 forks source link

Additionally fetch dynamic content #378

Open asim-shrestha opened 5 months ago

asim-shrestha commented 5 months ago

We have some workloads that require saving pages that have some click actions. These click actions will open a model, fetch data in the background, and then populate the modal with fetched data. I'm assuming this is far out of scope but just wondering if such a case could be dealt with.

Also, great tool!

snshn commented 5 months ago

Hi Asim,

that sounds like automation, but I think it could be done... what comes to mind is using Chrome as described in README, with a plugin that enables custom JS actions (open the modal, to make JS populate it with HTML). I think there's a way to set a delay on how soon Chrome returns the DOM in headless mode.

Alternatively, you can code something in puppeteer, feed the retrieved HTML via pipes into monolith, and do it that way.

asim-shrestha commented 5 months ago

that sounds like automation

Yeah we have workloads where If i could run something like playwright automations and save it statically, that would be a life saver. We have some tests/evals we want to continually run but network delays and other web issues make this impossibly difficult or time-consuming

with a plugin that enables custom JS actions

Does this exist already sorry? (apologies, super new to this repo)

Also additionally, the output in this case would again be a single HTML file correct? But now would additionally handle state changes after click actions?

And I imagine something similar also work for additional pages?

snshn commented 5 months ago

I know there's https://www.tampermonkey.net/ and some other similar extensions. If you install that in Chrome, add some custom user scripts to it, and then use Chrome in headless mode (as described in README.md of monolith), then you possibly could get the level of automation you're looking for, all saved as one .html file.