abiyani / automate-save-page-as

Automate browser's "Save Page As" operation
Apache License 2.0
167 stars 50 forks source link

How to implement Firefox Save As Complete Website? #25

Closed bhuether closed 5 years ago

bhuether commented 5 years ago

Hi, I am using Firefox Save as on MAC, and I was wondering how would I edit the script to implement Save As Complete Website?

thanks! Brian

abiyani commented 5 years ago

What do you mean by "Save as complete website" ? Do you mean saving all webpages within a website ? This tool is not meant for that task, it just saves a single web page. You will have to crawl all the links and call this tool repeatedly on them yourself.

An example of such a tool is available in another repo of mine: https://github.com/abiyani/orkut-community-downloader - which is a tool I created to save all pages on my Orkut profile (and internally uses automate-save-page-as).

bhuether commented 5 years ago

Yeah, I am trying to write a script in PHP where I am using curl to login to a website, then get a page that has a table of contents, where each link takes me to a page that I want to then download using Firefox Save As Website Complete.

I am on a MAC, and I think the X11 setup is just proving to be too difficult for me to overcome. I can get the script to run and open Firefox, but after that it doesn't see a display, despite me using XQuartz on MAC.

So now I am looking into Firefox automation scripts, but so far nothing is recording the save as actions.

Thanks, Brian

abiyani commented 5 years ago

Yes I have never tested the script on Mac, and don't expect it to work either (due to the way it is intrinsically tied to X).

Btw, this tools was written before Chrome headless was available, so you should look into that for automating the task instead: https://developers.google.com/web/updates/2017/04/headless-chrome

bhuether commented 5 years ago

Hi,

Funny you mention it. I just started looking into headless browsers today. But only Firefox Save As gets me all the needed resources. And need to figure out how to do Save As Website Complete using headless fFirefox browser but haven't seen yet how to do it. So this is quite the adventure in trying to figure things out! thanks, Brian

On Sun, Dec 16, 2018 at 1:20 AM Anurag Biyani notifications@github.com wrote:

Yes I have never tested the script on Mac, and don't expect it to work either (due to the way it is intrinsically tied to X).

Btw, this tools was written before Chrome headless was available, so you should look into that for automating the task instead: https://developers.google.com/web/updates/2017/04/headless-chrome

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/abiyani/automate-save-page-as/issues/25#issuecomment-447601903, or mute the thread https://github.com/notifications/unsubscribe-auth/AqqNYaSScIK06r7l20GZbQbbstlWZIqGks5u5XWrgaJpZM4ZUGNn .

bhuether commented 5 years ago

Not sure about chrome headless, but I don't see anything in the documentation for Firefox headless about a saveas method. Strangely, it is only Firefox saveas that extracts videos from HTML5 elements from the site I am saving. Chrome, various extensions, no matter what settings, don't get the video. On Dec 15, 2018 11:20 PM, "Anurag Biyani" notifications@github.com wrote:

Yes I have never tested the script on Mac, and don't expect it to work either (due to the way it is intrinsically tied to X).

Btw, this tools was written before Chrome headless was available, so you should look into that for automating the task instead: https://developers.google.com/web/updates/2017/04/headless-chrome

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/abiyani/automate-save-page-as/issues/25#issuecomment-447601903, or mute the thread https://github.com/notifications/unsubscribe-auth/AqqNYaSScIK06r7l20GZbQbbstlWZIqGks5u5XWrgaJpZM4ZUGNn .