IgnoredAmbience / yahoo-group-archiver

Scrapes and archives a Yahoo groups email archives, photo galleries and file contents using the non-public API
MIT License
93 stars 46 forks source link

Same message downloaded in infinite loop #75

Closed logological closed 4 years ago

logological commented 4 years ago

I'm experiencing a problem archiving one group. The group contains only a few hundred messages, but the script claims to start downloading at message 331 and then increments its message counter indefinitely:

/yahoo.py -ct "$(cat cookie_t)" -cy "$(cat cookie_y)" classstruggle
2019-10-27 19:10:19.428 CET INFO archive_message_metadata Archiving message metadata...
2019-10-27 19:10:20.286 CET INFO archive_message_metadata Archived message metadata records (331 of 359)
2019-10-27 19:10:20.442 CET INFO archive_message_metadata Archived message metadata records (332 of 359)
2019-10-27 19:10:20.602 CET INFO archive_message_metadata Archived message metadata records (333 of 359)
2019-10-27 19:10:20.769 CET INFO archive_message_metadata Archived message metadata records (334 of 359)
2019-10-27 19:10:20.950 CET INFO archive_message_metadata Archived message metadata records (335 of 359)
2019-10-27 19:10:21.112 CET INFO archive_message_metadata Archived message metadata records (336 of 359)
2019-10-27 19:10:21.294 CET INFO archive_message_metadata Archived message metadata records (337 of 359)
2019-10-27 19:10:21.457 CET INFO archive_message_metadata Archived message metadata records (338 of 359)
2019-10-27 19:10:21.673 CET INFO archive_message_metadata Archived message metadata records (339 of 359)
2019-10-27 19:10:21.824 CET INFO archive_message_metadata Archived message metadata records (340 of 359)
2019-10-27 19:10:22.057 CET INFO archive_message_metadata Archived message metadata records (341 of 359)
2019-10-27 19:10:22.212 CET INFO archive_message_metadata Archived message metadata records (342 of 359)
2019-10-27 19:10:22.382 CET INFO archive_message_metadata Archived message metadata records (343 of 359)
2019-10-27 19:10:22.568 CET INFO archive_message_metadata Archived message metadata records (344 of 359)
2019-10-27 19:10:23.063 CET INFO archive_message_metadata Archived message metadata records (345 of 359)
2019-10-27 19:10:23.239 CET INFO archive_message_metadata Archived message metadata records (346 of 359)
2019-10-27 19:10:23.402 CET INFO archive_message_metadata Archived message metadata records (347 of 359)
2019-10-27 19:10:23.558 CET INFO archive_message_metadata Archived message metadata records (348 of 359)
2019-10-27 19:10:23.719 CET INFO archive_message_metadata Archived message metadata records (349 of 359)
2019-10-27 19:10:23.875 CET INFO archive_message_metadata Archived message metadata records (350 of 359)
2019-10-27 19:10:24.049 CET INFO archive_message_metadata Archived message metadata records (351 of 359)
2019-10-27 19:10:24.212 CET INFO archive_message_metadata Archived message metadata records (352 of 359)
2019-10-27 19:10:24.368 CET INFO archive_message_metadata Archived message metadata records (353 of 359)
2019-10-27 19:10:24.542 CET INFO archive_message_metadata Archived message metadata records (354 of 359)
2019-10-27 19:10:24.722 CET INFO archive_message_metadata Archived message metadata records (355 of 359)
2019-10-27 19:10:24.892 CET INFO archive_message_metadata Archived message metadata records (356 of 359)
2019-10-27 19:10:25.073 CET INFO archive_message_metadata Archived message metadata records (357 of 359)
2019-10-27 19:10:25.228 CET INFO archive_message_metadata Archived message metadata records (358 of 359)
2019-10-27 19:10:25.394 CET INFO archive_message_metadata Archived message metadata records (359 of 359)
2019-10-27 19:10:25.550 CET INFO archive_message_metadata Archived message metadata records (360 of 359)
2019-10-27 19:10:25.711 CET INFO archive_message_metadata Archived message metadata records (361 of 359)
2019-10-27 19:10:25.868 CET INFO archive_message_metadata Archived message metadata records (362 of 359)

The script continues like this until I kill it. Examining the email subdirectory of the output folder shows a very large message_metadata_0.json that looks like it contains most of the group's messages, but all the files numbered message_metadata_1.json and above contain the same single message.

The group is restricted but not private so I suspect anyone can try to run the archiving script and see if they get the same error. I'm also experiencing the problem with one or two other groups.

IgnoredAmbience commented 4 years ago

Should be fixed when #73 is merged

logological commented 4 years ago

Thanks; I'll test once the merge goes through and close this issue if it's no longer reproducible.

IgnoredAmbience commented 4 years ago

Also thanks for giving a testcase, I'm using it to test the PR now :)

IgnoredAmbience commented 4 years ago

Merged

logological commented 4 years ago

Works for me now; thanks!