JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.39k stars 702 forks source link

Crash on scraping an Instagram profile: `invalid JSON (JSONDecodeError('Expecting value: line 1 column 1 (char 0)'))` #204

Closed andreacorradi closed 3 years ago

andreacorradi commented 3 years ago

Running the command:

snscrape --jsonl --progress --max-results 10000 instagram-user username > output.jsonl

I get this error:

2021-03-24 11:37:48.243  ERROR  snscrape.base  Error retrieving https://www.instagram.com/graphql/query/?query_hash=f2405b236d85e8296cf30347c9f08c2a&variables=%7B%22id%22:%22597620684%22,%22first%22:50,%22after%22:%22QVFCR3FuN0FKcWttdks5QkZnbnZWZzlmNm5ucGVFN1lDSS0tV2V4M3dudGNBTHlraVRLRi1fVHR1cUFEM2M1NHI5b0doUHFvYnloMVl0bTVLWUxjLUQ1cw==%22%7D: invalid JSON (JSONDecodeError('Expecting value: line 1 column 1 (char 0)'))
2021-03-24 11:37:48.244  CRITICAL  snscrape.base  4 requests to https://www.instagram.com/graphql/query/?query_hash=f2405b236d85e8296cf30347c9f08c2a&variables=%7B%22id%22:%22597620684%22,%22first%22:50,%22after%22:%22QVFCR3FuN0FKcWttdks5QkZnbnZWZzlmNm5ucGVFN1lDSS0tV2V4M3dudGNBTHlraVRLRi1fVHR1cUFEM2M1NHI5b0doUHFvYnloMVl0bTVLWUxjLUQ1cw==%22%7D failed, giving up.
2021-03-24 11:37:48.331  CRITICAL  snscrape._cli  Dumped stack and locals to /var/folders/4m/ph_320xj4gg9drp9wk435_4h0000gq/T/snscrape_locals_1_pudvzr
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/bin/snscrape", line 33, in <module>
    sys.exit(load_entry_point('snscrape==0.3.5.dev95+g5cd3b7d', 'console_scripts', 'snscrape')())
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/snscrape/_cli.py", line 270, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/snscrape/modules/instagram.py", line 161, in get_items
    r = self._get(f'https://www.instagram.com/graphql/query/?query_hash={self._queryHash}&variables={variables}', headers = headers, responseOkCallback = self._check_json_callback)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/snscrape/base.py", line 196, in _get
    return self._request('GET', *args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/snscrape/base.py", line 192, in _request
    raise ScraperException(msg)
snscrape.base.ScraperException: 4 requests to https://www.instagram.com/graphql/query/?query_hash=f2405b236d85e8296cf30347c9f08c2a&variables=%7B%22id%22:%22597620684%22,%22first%22:50,%22after%22:%22QVFCR3FuN0FKcWttdks5QkZnbnZWZzlmNm5ucGVFN1lDSS0tV2V4M3dudGNBTHlraVRLRi1fVHR1cUFEM2M1NHI5b0doUHFvYnloMVl0bTVLWUxjLUQ1cw==%22%7D failed, giving up.

Until yesterday everything worked very well, using the same command. I tried on new profiles as well on profiles I already scraped successfully, but the result is the same.

JustAnotherArchivist commented 3 years ago

Scraping that account works fine for me. It might be yet another one of Instagram's rate limiting or anti-scraping measures, or they had some temporary issues. Maybe the dump file (third line of the errors) contains more details.

andreacorradi commented 3 years ago

Thank you for the answer. Yesterday wasn't working at all, today it just downloads some hundreds posts and then stops. I'll keep on testing.

JustAnotherArchivist commented 3 years ago

Closing this as it's not reproducible and most likely caused by a rate limiting measure or a server-side issue. snscrape already catches the invalid response and retries, so there's nothing more it can do really. If you find something more or a workaround, feel free to reopen of course.