JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.38k stars 702 forks source link

JSONDecodeError while using twitter scrapers #602

Closed mpeter50 closed 1 year ago

mpeter50 commented 1 year ago

For the last few days I get the following error every time I try to scrape a tweet (with twitter-tweet scraper) or a list of tweets for a user (with twitter-user scraper):

CRITICAL  snscrape._cli  Dumped stack and locals to /tmp/snscrape_locals_hvthsp6z
Traceback (most recent call last):
  File "/home/username/.local/bin/snscrape", line 8, in <module>
    sys.exit(main())
  File "/home/username/.local/lib/python3.9/site-packages/snscrape/_cli.py", line 308, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "/home/username/.local/lib/python3.9/site-packages/snscrape/modules/twitter.py", line 886, in get_items
    obj = self._get_api_data(f'https://twitter.com/i/api/2/timeline/conversation/{self._tweetId}.json', params)
  File "/home/username/.local/lib/python3.9/site-packages/snscrape/modules/twitter.py", line 338, in _get_api_data
    self._ensure_guest_token()
  File "/home/username/.local/lib/python3.9/site-packages/snscrape/modules/twitter.py", line 299, in _ensure_guest_token
    if self._guestTokenManager.token is None:
  File "/home/username/.local/lib/python3.9/site-packages/snscrape/modules/twitter.py", line 251, in token
    self._read()
  File "/home/username/.local/lib/python3.9/site-packages/snscrape/modules/twitter.py", line 238, in _read
    o = json.load(fp)
  File "/usr/local/lib/python3.9/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/local/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Just a few days ago (~4) this was working fine, but since then it doesn't. I've searched the issues, and found those about the instagram and facebook scraper, where it was suggested that this is possibly because the user has been ratelimited. But at the same time, I'm able to view the profiles and tweets I try to scrape through twitter.com, even without logging in, so I'm not sure if thats the case now.

Before submitting this issue, I checked if it works on an other machine (of the same network) where it also worked previously, but it still works there.

TheTechRobo commented 1 year ago

Could you provide the dump file?

mpeter50 commented 1 year ago

Sure! Here it is: https://gist.github.com/mpeter50/fa6109e32cc87b80df2a07d4ddc0013c The dump file (along with the above exception) was created while trying to scrape the list of tweets from a popular twitter account.

TheTechRobo commented 1 year ago

What is the output of snscrape --version ? This looks like a guest token disk thing to me, but the lines don't match up with the development version. Are you on the release version?

TheTechRobo commented 1 year ago

If it is a guest token disk failure, you're on an older version, as error handling was added 6 months ago (https://github.com/JustAnotherArchivist/snscrape/blame/46a603053cfbc0ce3c54d43d7e1ac2427fa82b4d/snscrape/modules/twitter.py#L569)

mpeter50 commented 1 year ago

The command says I have this version: snscrape 0.4.3.20220106

mpeter50 commented 1 year ago

According to the tags, this seems to be the latest one. I installed it with regular pip3 install snscrape.

Though what you just said made me want to ask this: is it supported to run multiple instances of snscrape at the same time?

JustAnotherArchivist commented 1 year ago

Yeah, this fix is currently only in the development version. I intend to make a new release soon. As a quick fix, you can delete the ~/.cache/snscrape/cli-twitter-guest-token.json file. Closing as a duplicate of #494.

Yes, running multiple instances is fine, although the current release version also has a crash bug you might see rarely (#414). In that case, just rerun the scrapes that crash like that.