Closed dossy closed 5 years ago
In the "About" tab, "Group Settings", is it a Public or Restricted group? I'm able to get this script to download from Public groups. I get the same Time Out message when trying to copy a Restricted group.
@cas206 public group.
Group Settings
- This is a public group.
- Attachments are not permitted.
- Members cannot hide email address.
- Listed in Yahoo Groups directory.
- Membership does not require approval.
- Messages from new members require approval.
- All members can post messages.
@cas206 oh, interesting - the group info says "This is a public group." but when I actually go into the group's settings, I see:
The difference between "Public" and "Custom" with this group's settings is "Non Members can post messages" is unchecked in our custom settings. Could this really be the reason why it's failing?
I'm going to temporarily set the group type to actual "Public" and see if the archiver succeeds.
Ignore my comment. I had two Public groups work and 3 Restricted fail with Time out error. However, the fourth one I attempted was Restricted and it's downloading.
@cas206 I set the group to plain "Public" and get the timeout, still.
How old are the groups you're archiving? The one I'm working on was founded Dec 31, 1999. I wonder if this has something to do with it ...
I added some debugging pprint
s and the URL it's trying to request that it's timing out on is: https://groups.yahoo.com/api/v1/groups/extremeprogramming/messages?count=160472
I'm guessing that's so far back in the past that it's trying to access data that's no longer available ...
I'm also getting this error. The group is very old and may also be running into old content that's not available as well @dossy I'm not sure if there's a solution to this.
I implemented pagination in the script, fetching 1,000 messages at a time ... the script is now running. If this works to pull all 160,472 messages out of my group, I'll submit a PR.
Ones that don't work are 1998 (46309 messages), 1999 (88664 messages). Ones that worked are 2000 (26351 messages), 2001 (17564 messages).
Another that works though is 1999 (9280 messages).
@cas206 I pushed up my add-pagination
branch to my fork, if you want to give it a try on the larger groups.
https://github.com/dossy/yahoo-group-archiver/tree/add-pagination
Working now. Good work.
@dossy Excellent work. Mine is now working on all 19,000+ posts. This... may take a while. Thank you!
Tried your fork (thank you!) and I was able to download ~12000 messages. Afterwards gave the error:
Traceback (most recent call last):
File "./yahoo.py", line 200, in <module>
archive_email(yga, reattach=(not args.no_reattach), save=(not args.no_save))
File "./yahoo.py", line 44, in archive_email
raw_json = yga.messages(id, 'raw')
File "/YahooMigration2Google/yahoo-group-archiver/yahoogroupsapi.py", line 74, in get_json
r = self.s.get(uri, params=opts, allow_redirects=False, timeout=10)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 533, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 520, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 630, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 521, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='groups.yahoo.com', port=443): Read timed out. (read timeout=10)
@cmilanf Try pulling from my branch again, I just added commit dossy@576e23ec that adds skipping of existing files, and setting arbitrary --start
and --stop
message IDs for fetching specific ranges. I use the --start
to skip over messages that Y! can't fetch.
@cmilanf Try pulling from my branch again, I just added commit dossy/yahoo-group-archiver@576e23e that adds skipping of existing files, and setting arbitrary
--start
and--stop
message IDs for fetching specific ranges. I use the--start
to skip over messages that Y! can't fetch.
Great! After pulling your last commit I was able to continue fetching just were the error left me. Now continuing :)
@IgnoredAmbience @dossy could you please pull (request) the large group support back to the main repo?
I get an error with a large group (200k+ messages) as below. I tried dossy's fork, but it outputs the email as .eml files, when I was looking for JSON output to convert to other formats.
Traceback (most recent call last):
File "./yahoo.py", line 634, in
It is my intent to fix this tomorrow, ran out of time today I'm afraid.
After getting through the login issues thanks to #2's guidance to use the new (undocumented)
-ct
and-cy
parameters, the script now dies with the following error:With a normal logged in browser session, I can access the JSON API endpoint and get a response.
https://groups.yahoo.com/api/v1/groups/extremeprogramming/messages
But, the script fails with the error above.
I tried extending the timeout to 120 seconds, in case the request was just taking longer than expected, but it eventually times out, still.
Suggestions?