Y2Z / monolith

⬛️ CLI tool for saving complete web pages as a single HTML file
https://crates.io/crates/monolith
Creative Commons Zero v1.0 Universal
11.23k stars 315 forks source link

White page glitch. #297

Open dillfrescott opened 2 years ago

dillfrescott commented 2 years ago

Downloading any webpage from https://anilist.co yields a plain white screen.

snshn commented 2 years ago

Hi Cross!

Thank you for reporting the issue! I do see some things I could improve in monolith to make it save a bit more data on that page, but sadly that particular site is 99% JavaScript, and it tries to redirect to /404 unless the page's JS is executed within that domain, so even after I release optimizations/fixes to make it possible to at least render something saved from there, it'll only show up an empty layout of the page for a split second and then disappear.

You might have better luck saving that page using SingleFile, or saving it as file+folder, and then using monolith on that, to create an HTML file with some images/text embedded into it.

dillfrescott commented 2 years ago

Ah, gotcha. Thank you for the info! <3

snshn commented 2 years ago

Let's see...

chromium --headless --disable-gpu --dump-dom https://anilist.co/ | monolith - -b https://anilist.co/ -o anilist.html seems to work, only the opacity: 0 doesn't get changed on cover images, some JS issue.

chromium --headless --disable-gpu --dump-dom https://anilist.co/anime/104578/Shingeki-no-Kyojin-3-Part-2/ | monolith - -b https://anilist.co/anime/104578/Shingeki-no-Kyojin-3-Part-2/ -o shingeki-3-part-2.html seems to be working with minor glitches, you may want to play with various monolith flags. I'll see what I can improve in the program itself to better work when receiving input from stdin. This should work as well with chrome or Chrome/Chromium on MacOs, don't remember what the executable is called there, sorry.

nitincodery commented 1 year ago

Downloading any webpage from https://anilist.co yields a plain white screen.

They are saved pretty well by SingleFile.

snshn commented 1 year ago

SingleFile executes JS code and allows the page to take a couple hundred milliseconds to pull data via RESTful APIs, in case the page is a frontend-heavy web app.

Basically, SingleFile is more like a headless browser while Monolith is a scraper that never executes JS, hence why it needs something else to pre-render the page (execute JS, which often pulls JSON data that builds HTML).