IgnoredAmbience / yahoo-group-archiver

Scrapes and archives a Yahoo groups email archives, photo galleries and file contents using the non-public API
MIT License
93 stars 46 forks source link

Only 20 attachments are downloaded #110

Closed diggservo closed 4 years ago

diggservo commented 4 years ago

Hi,

It appears only 20 attachment folders are downloaded when I try to download attachments from private group as follows. $ ./yahoo.py -ct '' -cy '' '' --attachments

Group contains 120 000+ messages, so I was hopping script could save all attachments. Is there a way to get them all?

Thanks

ugcheleuce commented 4 years ago

Yahoogroups does not save all attachments in the "Attachments" section. To see how many attachments are in the "Attachments" section, visit https://groups.yahoo.com/neo/groups/GROUPNAME/attachments .

diggservo commented 4 years ago

Yahoogroups does not save all attachments in the "Attachments" section. To see how many attachments are in the "Attachments" section, visit https://groups.yahoo.com/neo/groups/GROUPNAME/attachments .

But I checked Attachments section before, there are huge amount of attachments (more than 100 000 for sure), but script only fetches 20, only 20 attachments folders exist after script finishes and only 20 entries exist in this file "attachments\allattachmentinfo.json"

I'm not good with Python... , maybe could you advise what to need to be changed in the script to make it fetch all attachments like script does for emails messages?

I see that for email messages script prefetches all metadata for emails before download in files like "email/message_metadata_xx.json" but in case of attachments logic is different and it fetches only first 20. How to make script to prefetch all attachments metadata (not just 20) prior to downloading them? So it could download thousands of them.

lennier1 commented 4 years ago

By default, the attachment API only give data on 20 attachments, but you can ask for more with the count parameter. I added a fix to my pull request. I was able to get 148 attachments from groupmanagersforum with it, but more would probably work. Let me know if there's a publicly joinable group you want me to test.

diggservo commented 4 years ago

By default, the attachment API only give data on 20 attachments, but you can ask for more with the count parameter. I added a fix to my pull request. I was able to get 148 attachments from groupmanagersforum with it, but more would probably work. Let me know if there's a publicly joinable group you want me to test.

fix works fine, thank you!

IgnoredAmbience commented 4 years ago

This has been merged into master.