gildas-lormeau / single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
GNU Affero General Public License v3.0
540 stars 58 forks source link

Set Chrome profile for puppeteer when using CLI #38

Closed beefdrifter closed 3 months ago

beefdrifter commented 3 years ago

Is your feature request related to a problem? Please describe. When SingleFile CLI saves a page with puppeteer, the page is saved without any of the familiar extensions your personal Chrome profile uses (for example, ad blockers), because puppeteer runs on a clean profile by default.

Describe the solution you'd like Would it be possible to add an argument that points to a Chrome profile folder for the CLI? It would allow the page to be saved the way you would normally see it when you're using Chrome, with extensions and all. Thanks!

gildas-lormeau commented 3 years ago

You can do this with the --browser-args which accepts a JSON string of the parameters to pass to Chrome. For example, to load the profile named "Default", you can pass to SingleFile CLI:

--browser-args "[\"--profile-directory=Default\"]"

Note that Chrome won't load extensions in headless mode. It means you'll also have to launch SingleFile with --browser-headless false.

beefdrifter commented 3 years ago

Thanks for the pointer! I'm having some trouble getting it to work though I added the JSON string you provided, and changed --browser-headles to false But I couldn't get it to load the right profile.

I've checked that my Chrome profile is indeed Default So I'm assuming that I'm pointing to the wrong directory? But I couldn't get the directory syntax right.

I've included the args.js below, any ideas?

Thanks 2021_0131_0211

gildas-lormeau commented 3 years ago

Actually that was not the correct parameter. I found the answer on StackOverflow. The parameter you should use is --user-data-dir and you should set it to the parent folder of the profile. You should also replace each backslash (to separate folders) with 4 backslashes.

"browser-args": "[\"--user-data-dir=C:\\\\Users\\\\beefdrifter\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\"]"
beefdrifter commented 3 years ago

It works. Thank you! However, it seems to only work when Chrome is not already launched. If a previous CLI process has already opened Chrome, and is not yet finished (so Chrome is still open), it will instead print this and abort

Failed to launch the browser process! [0202/013930.146:ERROR:chrome_main_delegate.cc(679)] Web security may only be disabled if '--user-data-dir' is also specified with a non-default value. [1048:31728:0202/013930.228:ERROR:cache_util_win.cc(20)] Unable to move the cache: �s���Q�ڡC (0x5) [1048:31728:0202/013930.229:ERROR:cache_util.cc(144)] Unable to move cache folder C:\Users\beefdrifter\AppData\Local\Google\Chrome\User Data\ShaderCache\GPUCache to C:\Users\beefdrifter\AppData\Local\Google\Chrome\User Data\ShaderCache\old_GPUCache_000 [1048:31728:0202/013930.229:ERROR:disk_cache.cc(184)] Unable to create cache [1048:31728:0202/013930.229:ERROR:shader_disk_cache.cc(606)] Shader Cache Creation failed: -2

Any idea what might cause this? Also, every time it launches Chrome, it opens a blank tab alongside the page that it's saving, this causes empty tabs to build up after saving a few rounds, unless they are manually closed.

gildas-lormeau commented 3 years ago

The conflict issue is indeed a known issue when using puppeteer. I guess the easiest way to circumvent it is to copy the profile folder and use the copy for SingleFile. I think I need more details about the empty tab issue. Are you capturing capturing a list of URLs?

beefdrifter commented 3 years ago

"browser-args": "[\"--user-data-dir=C:\\\\Users\\\\beefdrifter\\\\Puppeteer\"]", I tried copying the Default profile folder to this directory, but it doesn't seem to work. The profile is not used, and instead Chrome creates another set of User Data in the Puppeteer folder If I copy all the files fromC:\Users\beefdrifter\AppData\Local\Google\Chrome\User Data to the Puppeteer folder The profile works again, but I run into the same issue of not being able to run multiple instances of the CLI at the same time

Are you capturing capturing a list of URLs?

No, just a single URL every time. What happens is more like this:

  1. run single-file http://url1.com
  2. puppeteer opens a Chrome window with 1 tab, displaying url1
  3. page is saved, Chrome window closes
  4. run single-file http://url2.com
  5. puppeteer opens a Chrome window with 2 tabs, the first is empty, the second is url2
  6. etc.. each time adds a new tab
gildas-lormeau commented 3 years ago

Regarding the profile folder, maybe you should copy the parent folder too.

Regarding the blank page, I'll try to see what's wrong but I guess it's probably a bug in puppeteer related to the fact that you use the same profile.

beefdrifter commented 3 years ago

I tried copying the profile's parent folder over too, which allows the profile to work again, but then I am still unable to run multiple instances of the CLI at the same time.

Ah ok, I hope this isn't an inherent limitation of puppeteer :O

scruel commented 2 years ago

@gildas-lormeau What about Firefox? I tried the following command but not works: single-file https://www.wikipedia.org "wikipedia.html" --back-end=webdriver-gecko --browser-headless=false --browser-args "[\"-P C:\Users\scruel\AppData\Roaming\Mozilla\Firefox\Profiles\test.default-release\"]"

It was still open with the default profile.

gildas-lormeau commented 2 years ago

@scruel see answer here: https://github.com/gildas-lormeau/SingleFile/issues/809#issuecomment-968155970

gildas-lormeau commented 3 months ago

This issue is now obsolete.