JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.31k stars 698 forks source link

ERROR snscrape.modules.twitter Content type of ... is not JSON #2

Closed ivan closed 5 years ago

ivan commented 5 years ago

While running snscrape twitter-user jmestepa, I observed:

2018-09-28 10:58:51.418  ERROR  snscrape.modules.twitter  Content type of https://twitter.com/i/search/timeline?f=tweets&vertical=default&lang=en&q=from%3Ajmestepa&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-329031970229207042-1045516283784171520 is not JSON
2018-09-28 11:02:36.135  ERROR  snscrape.modules.twitter  Content type of https://twitter.com/i/search/timeline?f=tweets&vertical=default&lang=en&q=from%3Ajmestepa&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-18781362769-1045516283784171520 is not JSON

That looks like a transient error on a bad response that snscrape should probably re-fetch, if it doesn't already.

From the error message, I assume the worst case of "it gave up after that error", but if that is not the case, maybe that could be made more obvious?

Edit: on other accounts too:

2018-09-28 11:09:36.836  ERROR  snscrape.modules.twitter  Content type of https://twitter.com/i/search/timeline?f=tweets&vertical=default&lang=en&q=from%3Arepublicoftogo&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-1020278804193972224-1045610783923744768 is not JSON
JustAnotherArchivist commented 5 years ago

These errors are retried up to three times (or a different number if you used --retry alias --retries). I tend to run snscrape with -v for more verbose output, and that makes it obvious that it's retried.

But I agree that it's confusing when you don't enable verbose output. This log message should probably be a warning with ", retrying" added at the end and an error without that addition when it persists even on the last retry.