JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.5k stars 713 forks source link

Twitter 'latest' search fails with `non-200 status code (401)` #834

Closed ExtremeSRL closed 1 year ago

ExtremeSRL commented 1 year ago

Describe the bug

twitter search stop working

File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\python311\Scripts\snscrape.exe\__main__.py", line 7, in <module>
  File "C:\Python311\Lib\site-packages\snscrape\_cli.py", line 320, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 1659, in get_items
    for obj in self._iter_api_data('https://api.twitter.com/2/search/adaptive.json', _TwitterAPIType.V2, params, paginationParams, cursor = self._cursor):
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 761, in _iter_api_data
    obj = self._get_api_data(endpoint, apiType, reqParams)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 727, in _get_api_data
    r = self._get(endpoint, params = params, headers = self._apiHeaders, responseOkCallback = self._check_api_response)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\base.py", line 251, in _get
    return self._request('GET', *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\base.py", line 247, in _request
    raise ScraperException(msg)
snscrape.base.ScraperException: 4 requests to https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=lang%3Ait&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe failed, giving up.

How to reproduce

twitter search scraper

Expected behaviour

retrieve twitter post

Screenshots and recordings

File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\python311\Scripts\snscrape.exe\__main__.py", line 7, in <module>
  File "C:\Python311\Lib\site-packages\snscrape\_cli.py", line 320, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 1659, in get_items
    for obj in self._iter_api_data('https://api.twitter.com/2/search/adaptive.json', _TwitterAPIType.V2, params, paginationParams, cursor = self._cursor):
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 761, in _iter_api_data
    obj = self._get_api_data(endpoint, apiType, reqParams)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 727, in _get_api_data
    r = self._get(endpoint, params = params, headers = self._apiHeaders, responseOkCallback = self._check_api_response)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\base.py", line 251, in _get
    return self._request('GET', *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\base.py", line 247, in _request
    raise ScraperException(msg)
snscrape.base.ScraperException: 4 requests to https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=lang%3Ait&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe failed, giving up.

Operating system

windows 10

Python version: output of python3 --version

3.7.5

snscrape version: output of snscrape --version

0.6.1.20230315.dev2+gedac5f3

Scraper

twitter-search

How are you using snscrape?

CLI (snscrape ... as a command, e.g. in a terminal)

Backtrace

No response

Log output

File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\python311\Scripts\snscrape.exe\__main__.py", line 7, in <module>
  File "C:\Python311\Lib\site-packages\snscrape\_cli.py", line 320, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 1659, in get_items
    for obj in self._iter_api_data('https://api.twitter.com/2/search/adaptive.json', _TwitterAPIType.V2, params, paginationParams, cursor = self._cursor):
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 761, in _iter_api_data
    obj = self._get_api_data(endpoint, apiType, reqParams)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 727, in _get_api_data
    r = self._get(endpoint, params = params, headers = self._apiHeaders, responseOkCallback = self._check_api_response)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\base.py", line 251, in _get
    return self._request('GET', *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\base.py", line 247, in _request
    raise ScraperException(msg)
snscrape.base.ScraperException: 4 requests to https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=lang%3Ait&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe failed, giving up.

Dump of locals

File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\python311\Scripts\snscrape.exe\__main__.py", line 7, in <module>
  File "C:\Python311\Lib\site-packages\snscrape\_cli.py", line 320, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 1659, in get_items
    for obj in self._iter_api_data('https://api.twitter.com/2/search/adaptive.json', _TwitterAPIType.V2, params, paginationParams, cursor = self._cursor):
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 761, in _iter_api_data
    obj = self._get_api_data(endpoint, apiType, reqParams)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\modules\twitter.py", line 727, in _get_api_data
    r = self._get(endpoint, params = params, headers = self._apiHeaders, responseOkCallback = self._check_api_response)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\base.py", line 251, in _get
    return self._request('GET', *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\snscrape\base.py", line 247, in _request
    raise ScraperException(msg)
snscrape.base.ScraperException: 4 requests to https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=lang%3Ait&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe failed, giving up.

Additional context

none

NicoBerlo commented 1 year ago

Actually the error returned by twitter is : {"errors":[{"message":"Bad Authentication data","code":215}]}

Yomguithereal commented 1 year ago

Related to medialab/minet#682

Hesko123 commented 1 year ago

Related to medialab/minet#682

Don't think it is related to the latest tab as it's not something new. Snscrape was already aware about that.

JustAnotherArchivist commented 1 year ago

Actually the error returned by twitter is : {"errors":[{"message":"Bad Authentication data","code":215}]}

No, it isn't. Why do people keep opening the API URLs in browsers and expecting it to work despite lacking the relevant authentication headers?

Yes, looks like Twitter removed the 'latest' search again. Unless they reverse that, it's unlikely that there is a fix for this. Cf. #634 for the previous occurrence of this a couple months ago.

JustAnotherArchivist commented 1 year ago

For the record, the error returned by Twitter is:

{"errors":[{"code":32,"message":"Could not authenticate you."}]}
Mr-Freewan commented 1 year ago

Same error in TwitterUserScraper. Not only search. Moreover, there was a short period of time when it worked, but after about 10 minutes it broke again

JustAnotherArchivist commented 1 year ago

It is only the latest search, but twitter-user, twitter-hashtag, and a few more scrapers are simple wrappers around the search, so yes, they're also affected.

Mr-Freewan commented 1 year ago

I noticed that it works, but it is very unstable. It gives an error (non-200 (401)) in 2 out of 3 requests, but it works fine on 3.

Mr-Freewan commented 1 year ago

It working again =)

JustAnotherArchivist commented 1 year ago

Haven't seen any further interruptions, but I'll keep the issue open and pinned for now in case it returns.

JustAnotherArchivist commented 1 year ago

No more issues since Saturday. :-)

ExtremeSRL commented 1 year ago

twitter these days is making changes to the business plans and I guess therefore also to the API. Let's stay tuned because I'm afraid there will be more problems. In the meantime always thanks for your great work!

0xTechnician commented 1 year ago

The problem just came back! 401 on search by query

projectno3 commented 1 year ago

I have issues too, but for me it alternates between working and not working (as if my internet connection was unstable, but that is not the case).

rmnhg commented 1 year ago

I only have this problem with twitter-user. twitter-search runs fine (if I don't use any parameter like from:USERNAME)

JustAnotherArchivist commented 1 year ago

@rmnhg No, it happens with both. twitter-user is a very thin wrapper around twitter-search anyway; wouldn't make any sense if they didn't behave the same (unless they were restricting specifically from:X queries, which isn't the case). You probably just got lucky on your twitter-search runs and unlucky on the twitter-user ones.

Mr-Freewan commented 1 year ago

It works, but it is very unstable. Apparently Twitter is doing some work on its servers again.

codilau commented 1 year ago

What I see is that unauthenticated searches fail even in the browser. "Your account may not be allowed to perform this action. Please refresh the page and try again."

kooperalan commented 1 year ago

I set a delay of 1 minute between each tweet and it works.

Josias-TopicWorx commented 1 year ago

Issue seems to persist for me, every other request returns data. As @kooperalan said, adding a delay seems to work. For me, adding a 10 seconds delay has completely removed the problem for me.

JustAnotherArchivist commented 1 year ago

I can't reproduce it anymore since a few minutes ago. All my test searches seem to succeed now.

Edit: Nevermind, still happens.

Mr-Freewan commented 1 year ago

It looks like the problem will persist until the work on the servers is completed

p.s. delay doesn't work for me (=

Isid28 commented 1 year ago

for how long because i need the data for my thesis ?

0xTechnician commented 1 year ago

It looks like the problem will persist until the work on the servers is completed

p.s. delay doesn't work for me (=

What are you referring to by "work on the servers"? Was there any communication about it ? Seems unlikely since its a private API.

Hesko123 commented 1 year ago

Hope it ends well. AntoinePaix, please do something to save us ! 🤣

Hesko123 commented 1 year ago

It works again for me ! 🚀

EDIT: My bad, still facing it....

EDIT2 : works again, it seems it's completely random.

zorrobiwan commented 1 year ago

For me too (through a lambda function in eu-west-3)

AntoinePaix commented 1 year ago

I have the same error with my own scraper and a complete different implementation (I use http). The 'top' tab works well but not the 'latest' tab.

But last night the advanced search was working fine with 'latest'...

Hesko123 commented 1 year ago

I have the same error with my own scraper and a complete different implementation (I use http). The 'top' tab works well but not the 'latest' tab.

But last night the advanced search was working fine with 'latest'...

What's the advantage of using http implementation ?

AntoinePaix commented 1 year ago

Ooops, I meant httpx. It's a python client with nice features such as request/response hooks, http2 and async capabilities.

Hesko123 commented 1 year ago

Ooops, I meant httpx. It's a python client with nice features such as request/response hooks, http2 and async capabilities.

Oh async fort multi threading ? Response hooks for tweet responses ?

AntoinePaix commented 1 year ago

@Hesko123 async like if you want to run multiple scrapers inside one thread.

Twitter's problem with the 'latest' search is really episodic. I just ran my personal scraper several times, the first 2 failed but the third passed without issue.

Hesko123 commented 1 year ago

httpx

Oh yeah so at a point it works even if it intended to not work first

AntoinePaix commented 1 year ago

@Hesko123 The response hook system of httpx is designed to call a function just before the request is sent or just after you receive a response.

Hesko123 commented 1 year ago

@Hesko123 The response hook system of httpx is designed to call a function just before the request is sent or just after you receive a response.

Merci le boss ! J'ai vue que tu venais de jvc aussi ;)

AntoinePaix commented 1 year ago

@Hesko123 On vit dans un petit monde ^^

Hesko123 commented 1 year ago

@Hesko123 On vit dans un petit monde ^^

Tbh I am pretty scared since Elon musk acquired twitter. You told me that this case is occasional but why is it happening, do we have a workaround for this issue ? It seems to be twitter side and we can't do anything on top of that.0.

Btw do you have a discord ?

mc0ps commented 1 year ago

@Hesko123 async like if you want to run multiple scrapers inside one thread.

Twitter's problem with the 'latest' search is really episodic. I just ran my personal scraper several times, the first 2 failed but the third passed without issue.

It seems almost random (just worked 1/5 times for me). I'm wondering if it has something to do with the user-agent, because I noticed that it's set randomly.

EDIT: maybe not, I just tried setting the user-agent to one of the ones that worked, and seems to fail repeatedly anyway

AntoinePaix commented 1 year ago

It's quite weird but when I copy as curl the request made to the adaptive.json API, if I remove the cookies I have 1/3 the authentication error.

But if I put the cookies back with only the "guest_id" cookie I have the impression that I no longer have the authentication problem...

AntoinePaix commented 1 year ago

The guest_id cookie is set when you do a request to the frontend endpoint of advanced search API like : https://twitter.com/search?q=ukraine&src=typed_query&f=live

dengkefeng commented 1 year ago

I see snscrape call twitter by using twitter api "https://api.twitter.com/2/search/adaptive.json",so is it going to be affected by twitter new policy with very small free rate limit. Is there a plan to fix this problem, like supporting scrape twitter by webpage (e.g: https://twitter.com/search?q=from%3Aelonmusk&src=typed_query&f=live)?

JustAnotherArchivist commented 1 year ago

@dengkefeng #695

JustAnotherArchivist commented 1 year ago

@AntoinePaix Negative, I'm also seeing failures with the guest_id cookie set.

dengkefeng commented 1 year ago

@dengkefeng #695

Got it, thanks @JustAnotherArchivist very much. So how do we resolve the main issue in this thread? Just wait for twitter to come back? Thanks!

AntoinePaix commented 1 year ago

@JustAnotherArchivist Ah darn. Finally it is rather good news, it means that it is not necessarily a problem related to a new method of authentication.

laurent-IA commented 1 year ago

the search function is no more accessible if you are not logged in . . so I suppose it is the end for snscrape

JustAnotherArchivist commented 1 year ago

The error is now blocked (403), and all requests seem to be affected. No indication of what's happening on the web interface though, just 'Please refresh the page and try again', so it may well be unintentional.

pablorm296 commented 1 year ago

Maybe earlier today we were witnessing a canary deploy that directed x% of the traffic to this new version where the search feature and the guest token are no longer available :disappointed:

Mr-Freewan commented 1 year ago

@JustAnotherArchivist, can you scrape tweets from a profile page? (by a direct link to this profile)

MrCabss69 commented 1 year ago

https://developer.twitter.com/en/products/twitter-api

100$/month for 10k tweet read limit cap and they limit almost all the guest searches by now ... LOL