JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.33k stars 699 forks source link

Reddit timing out #149

Closed santoshbs closed 3 years ago

santoshbs commented 3 years ago

Since yesterday, the snscrape for Reddits is timing out (even with increased retries):

snscrape --jsonl --verbose --retries 20 reddit-subreddit --before 16020869889 technology > subreddit-technology.json

`snscrape.base Error retrieving https://api.pushshift.io/reddit/search/submission/?subreddit=technology&size=1000&before=16020869889&sort=desc: ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='api.pushshift.io', port=443): Read timed out. (read timeout=10)")), retrying

`

JustAnotherArchivist commented 3 years ago

Yeah, Pushshift (which snscrape uses for Reddit because Reddit's own search is horrible) is having issues lately: https://old.reddit.com/r/pushshift/comments/jm8yyt/aggregations_have_been_temporarily_disabled_to/ Nothing can be done about this, unfortunately.