csaftoiu / yahoo-groups-backup

A python script to backup the contents of private Yahoo! groups.
The Unlicense
37 stars 18 forks source link

Remove Selenium for private groups #28

Open andrewferguson opened 8 years ago

andrewferguson commented 8 years ago

Selenium is not required for private groups. Cookies can be used as an alternative. How to do this: 1.) Login to the Yahoo account that is subscribed to the private group in a web browser 2.) Get a valid Netscape-format cookies.txt file from the browser (in FireFox, the extension Export Cookies will let you do this through Tools > Export Cookies... once it is installed. 3.) Try out the following python script (python 2) changing the name of the group in the API to a private one that your Yahoo account is subscribed to:

import cookielib, urllib2 cj = cookielib.MozillaCookieJar() cj.load("cookies.txt") opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) r = opener.open("https://groups.yahoo.com/api/v1/groups/HudsonValleyEcycle/messages/1/raw") f = open("data.json", "wb") f.write(r.read()) f.close()

(presuming that the cookie file you downloaded is called 'cookies.txt' and in the same dir as the python script 4.) Bonus: if you want, remove all the cookies in the cookie file except the two from the domain ".yahoo.com" called T and Y. Those are the two that handle login info.

csaftoiu commented 8 years ago

Hmm, interesting. Initially I tried using mechanize, which handles cookies like this automatically, but I couldn't get it to successfully login - probably the login page used too much JavaScript for it. It didn't appear to work on PhantomJS either, which is interesting.

If you can find a way to get that cookie data without using Selenium, i.e. mechanize or some other alternative, I'll gladly merge that PR. I'll leave the issue open for now pending me eventually looking into this. But as it currently works for me it's not too likely to be soon, hehe.

andrewferguson commented 8 years ago

The purpose of me filing that issue was really more just to bring the fact that Selenium can be omitted if desired to your attention - no need to change anything (sadly GitHub does not support comments to a project - at least that I know of).

I'm afraid the only way I know of getting the cookie data is to manually copy it from the browser.

On 5 August 2016 at 21:28, Claudiu notifications@github.com wrote:

Hmm, interesting. Initially I tried using mechanize, which handles cookies like this automatically, but I couldn't get it to successfully login - probably the login page used too much JavaScript for it. It didn't appear to work on PhantomJS either, which is interesting.

If you can find a way to get that cookie data without using Selenium, i.e. mechanize or some other alternative, I'll gladly merge that PR. I'll leave the issue open for now pending me eventually looking into this. But as it currently works for me it's not too likely to be soon, hehe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/28#issuecomment-237954935, or mute the thread https://github.com/notifications/unsubscribe-auth/AM5MrVDd6AxajOovTq0xvwELAE1vr_nqks5qc5zYgaJpZM4JdlY2 .

csaftoiu commented 8 years ago

Ahh okay, understood. Thanks, I appreciate the info! Fair point about GitHub not supporting comments, also.

I will still leave the issue open pending more investigation since I would indeed prefer it to not have to use Selenium and open a Firefox window... we shall see.

andrewferguson commented 8 years ago

Doesn't Selenium also support the headless HtmlUnit driver? It may render pages in a slightly different way to FireFox, but I use Selenium in one of my projects and I plan to migrate to HtmlUnit once I've got the code worked out.

On 5 August 2016 at 22:06, Claudiu notifications@github.com wrote:

Ahh okay, understood. Thanks, I appreciate the info! Fair point about GitHub not supporting comments, also.

I will still leave the issue open pending more investigation since I would indeed prefer it to not have to use Selenium and open a Firefox window... we shall see.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/28#issuecomment-237966265, or mute the thread https://github.com/notifications/unsubscribe-auth/AM5MrQLIXqZIGmlYSc0-vVOCNaIkkJgOks5qc6XjgaJpZM4JdlY2 .

csaftoiu commented 8 years ago

Hmm! Interesting. Thanks to @bitstein there's an option to pick which webdriver to use. I tried it briefly with PhantomJS and it didn't seem to work - maybe HtmlUnit will succeed where it failed, or I may have been doing something silly when I tried PhantomJS.