JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.31k stars 698 forks source link

Add retries for flaky Twitter errors #953

Closed novucs closed 1 year ago

novucs commented 1 year ago

Errors can be often reproduced with the following user: jmrousseau

Even broken on Twitter itself, if you scroll down their page: https://twitter.com/jmrousseau

image

novucs commented 1 year ago

Caveat: This change broadly works for most users on Twitter, but there are still some nasty profiles out there that Twitter really struggles with.

Here's a sample of profiles that don't work, even with these changes: https://twitter.com/GetVid_/with_replies https://twitter.com/MakeItACover/with_replies https://twitter.com/_screenshoter/with_replies

They all appear to be reasonably popular reply bots, which I imagine is what's causing Twitter so much stress.

JustAnotherArchivist commented 1 year ago

Good point, and polishing this has been on my todo list for a release anyway. However, snscrape already has machinery for retrying requests, so this should be implemented using that instead.

novucs commented 1 year ago

Good point, and polishing this has been on my todo list for a release anyway. However, snscrape already has machinery for retrying requests, so this should be implemented using that instead.

Thanks! I've refactored this to reuse existing retry logic now

JustAnotherArchivist commented 1 year ago

That would solve the immediate case of profile timelines, but Twitter can (and does) return errors everywhere, so it needs to be handled in _check_api_response instead. I've done the necessary changes locally, just testing it currently.