gildas-lormeau / single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
GNU Affero General Public License v3.0
602 stars 63 forks source link

Following up loading really long web page issue... (#85) #86

Open sanjeevApple opened 4 months ago

sanjeevApple commented 4 months ago

Couple of other problems after the fix in version 2.0.36

1) It seems like there is another issue, the fix works for Chrome but not for Deno browser, which is not a big deal, but wanted to let you know.

So this works:

./single-file https://www.apple.com/macbook-air macbook-air.html --browser-executable-path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --browser-height=50000 --browser-wait-until="load" --load-deferred-images-keep-zoom-level=true

but this hangs when I remove the browser executable path and use Deno

./single-file https://www.apple.com/macbook-air macbook-air.html --browser-height=50000 --browser-wait-until="load" --load-deferred-images-keep-zoom-level=true

2) It is back to the original problem, for some reason the fix didn't help. Even with scrolling pixel height of roughly 30000 hangs with browser height of 30000 for airpods-max.

./single-file https://www.apple.com/airpods-max airpods-max.html --browser-executable-path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --browser-height=30000 --browser-wait-until="load" --load-deferred-images-keep-zoom-level=true

hangs..

3) Also what I observed is that if you download the airpods-max page in a browser, physically scroll to the bottom, and use the plugin to save the file, there is no problem at all and also it is superfast, under 5 seconds, guess the page is already in the browser. But when you use single-file cli, there are a couple of problem, one is the hang, and second is the speed, takes a minute or so compared to 5 seconds. Wondering what is so different between browser extension vs cli, even though it is using the same chrome browser to render the web page and then saving it. The only other difference would be physical scrolling vs --browser-height=50000.

Thanks Cheers...

gildas-lormeau commented 4 months ago

I cannot reproduce your issue.

There isn't any browser in Deno. When you run single-file without the --browser-executable-path=... option, the chosen browser is the first found in the list below. https://github.com/gildas-lormeau/single-file-cli/blob/d8d48fae3a0f80e339cf487fd6b4c8cf875437c6/lib/constants.js#L105-L117

Is Chromium installed in /Applications/Chromium.app/Contents/MacOS/Chromium on your machine? If so, what is its version number?

If you try to pass the height of the screen with the --browser-height=... option then you don't need to use --load-deferred-images-keep-zoom-level=true. The latter option is useful when you don't want to set the browser height.

sanjeevApple commented 4 months ago

I have Google Chrome installed, the Version is 124.0.6367.119 (Official Build) (arm64). So not quite sure why it hung when --browser-executable-path was not specified. It should have picked up Google Chrome, don't have Chromium installed. I can try again. May be the hang was for a different reason.

I just tried this:

./single-file https://www.apple.com/airpods-max airpods-max.html --browser-executable-path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --browser-height=30000 --browser-wait-until="load"

and it just hangs... Is this working for you? I do have the latest build with version 2.0.36

./single-file --version 2.0.36

sanjeevApple commented 4 months ago

Wondering if there is something in your local environment which is not reflected in the build, may be some settings or the default options (we are only overriding browser height and browser wait until), which is still causing it to hang when run directly by downloading the latest release from the git...

gildas-lormeau commented 4 months ago

If confirm that the command you mentioned works fine for me (on a Apple M2 with 16GB of RAM). Maybe you should remove --browser-wait-until="load" or replace it with --browser-wait-until="networkAlmostIdle".

sanjeevApple commented 4 months ago

In both cases with either --browser-wait-until property removed or changed to networkAlmostIdle, it is still stuck. When I Ctrl-C out of it, this is what I see...

sanjeevApple commented 4 months ago

Uncaught (in promise) NotFound: No such file or directory (os error 2): remove '/var/folders/g5/4hp_ppfs47g8gmd1w6xp0_cm0000gn/T/2aad9786888d493' await remove(profilePath, { recursive: true }); ^ at async Object.remove (ext:deno_fs/30_fs.js:260:3) at async closeBrowser (file:///Users/sanjeev/Downloads/single-file-cli/lib/browser.js:120:3) at async closeBrowserAndExit (file:///Users/sanjeev/Downloads/single-file-cli/single-file-launcher.js:122:2)

sanjeevApple commented 4 months ago

the url apple.com works fine with different browser heights, 20K, 30K and 50K.

the apple.com/airpods-max, works fine with no height (lots of images are missing, so that won't work), at height 20K it take 6 min but is still missing a lot of images. At height 35K or 50K it just hangs, this is without browser wait delay option.

I tried it on another laptop, Apple M2 Pro, same behavior, it hangs at 35K, 50K height.