JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.51k stars 712 forks source link

i am getting 404 block #1003

Closed ihabpalamino closed 1 year ago

ihabpalamino commented 1 year ago

Describe the bug

Error retrieving https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline?variables=%7B%22rawQuery%22%3A%22%28from%3ANone%29%22%2C%22count%22%3A20%2C%22product%22%3A%22Latest%22%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%7D&features=%7B%22rweb_lists_timeline_redesign_enabled%22%3Afalse%2C%22blue_business_profile_image_shape_enabled%22%3Afalse%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22vibe_api_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Afalse%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Afalse%2C%22interactive_text_enabled%22%3Atrue%2C%22responsive_web_text_conversations_enabled%22%3Afalse%2C%22longform_notetweets_rich_text_read_enabled%22%3Afalse%2C%22longform_notetweets_inline_media_enabled%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%2C%22responsive_web_twitter_blue_verified_badge_is_enabled%22%3Atrue%7D: blocked (404) 4 requests to https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline?variables=%7B%22rawQuery%22%3A%22%28from%3ANone%29%22%2C%22count%22%3A20%2C%22product%22%3A%22Latest%22%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%7D&features=%7B%22rweb_lists_timeline_redesign_enabled%22%3Afalse%2C%22blue_business_profile_image_shape_enabled%22%3Afalse%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22vibe_api_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Afalse%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Afalse%2C%22interactive_text_enabled%22%3Atrue%2C%22responsive_web_text_conversations_enabled%22%3Afalse%2C%22longform_notetweets_rich_text_read_enabled%22%3Afalse%2C%22longform_notetweets_inline_media_enabled%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%2C%22responsive_web_twitter_blue_verified_badge_is_enabled%22%3Atrue%7D failed, giving up. Errors: blocked (404), blocked (404), blocked (404), blocked (404) 127.0.0.1 - - [03/Jul/2023 13:05:51] "POST /scrape-tweets2 HTTP/1.1" 500 - Traceback (most recent call last): File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 2213, in call return self.wsgi_app(environ, start_response) File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 2193, in wsgi_app response = self.handle_exception(e) File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 2190, in wsgi_app response = self.full_dispatch_request() File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 1486, in full_dispatch_request rv = self.handle_user_exception(e) File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 1484, in full_dispatch_request rv = self.dispatch_request() File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\flask\app.py", line 1469, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(*view_args) File "C:\Users\HP Probook\PycharmProjects\firstproject\TweetsSraper.py", line 29, in scrape_tweets for i, tweet in enumerate(scraper.get_items()): File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\snscrape\modules\twitter.py", line 1699, in get_items for obj in self._iter_api_data('https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline', _TwitterAPIType.GRAPHQL, params, paginationParams, cursor = self._cursor, instructionsPath = ['data', 'search_by_raw_query', 'search_timeline', 'timeline', 'instructions']): File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\snscrape\modules\twitter.py", line 867, in _iter_api_data obj = self._get_api_data(endpoint, apiType, reqParams, instructionsPath = instructionsPath) File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\snscrape\modules\twitter.py", line 838, in _get_api_data r = self._get(endpoint, params = params, headers = self._apiHeaders, responseOkCallback = functools.partial(self._check_api_response, apiType = apiType, instructionsPath = instructionsPath)) File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\snscrape\base.py", line 272, in _get return self._request('GET', args, **kwargs) File "C:\Users\HP Probook\PycharmProjects\firstproject\venv\lib\site-packages\snscrape\base.py", line 268, in _request raise ScraperException(msg) snscrape.base.ScraperException: 4 requests to https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline?variables=%7B%22rawQuery%22%3A%22%28from%3ANone%29%22%2C%22count%22%3A20%2C%22product%22%3A%22Latest%22%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%7D&features=%7B%22rweb_lists_timeline_redesign_enabled%22%3Afalse%2C%22blue_business_profile_image_shape_enabled%22%3Afalse%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22vibe_api_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Afalse%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Afalse%2C%22interactive_text_enabled%22%3Atrue%2C%22responsive_web_text_conversations_enabled%22%3Afalse%2C%22longform_notetweets_rich_text_read_enabled%22%3Afalse%2C%22longform_notetweets_inline_media_enabled%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%2C%22responsive_web_twitter_blue_verified_badge_is_enabled%22%3Atrue%7D failed, giving up.

How to reproduce

it was working fine but not anymore

Expected behaviour

my code from datetime import datetime

from flask import Flask, request, jsonify

import snscrape.modules.twitter as sntwitter import pandas as pd import json import re

app = Flask(name)

@app.route('/scrape-tweets2', methods=['POST']) def scrape_tweets(): Username = request.form.get('username') SINCE = request.form.get('since') UNTIL = request.form.get('until') PLATFORM_NAME = request.form.get('plateform')

if SINCE and UNTIL:
    since_date = datetime.strptime(SINCE, "%Y-%m-%d")
    until_date = datetime.strptime(UNTIL, "%Y-%m-%d")
    date_range = f" since:{since_date.strftime('%Y-%m-%d')} until:{until_date.strftime('%Y-%m-%d')}"
else:
    date_range = ""

scraper = sntwitter.TwitterSearchScraper(f"(from:{Username}){date_range}")
tweets = []
for i, tweet in enumerate(scraper.get_items()):
    if tweet.media is not None and any(mediatype == "video" for mediatype in tweet.media):
        view_count = tweet.viewCount
    else:
        view_count = "Not a video tweet"

    data = {
        "id_post": tweet.id,
        "Date": tweet.date.strftime("%Y-%m-%d"),
        "Heure": tweet.date.strftime("%H:%M:%S"),
        "content": tweet.content,
        "username": tweet.user.username,
        "likecount": tweet.likeCount,
        "shares": tweet.retweetCount,
        "comments": tweet.replyCount,
        "platformname": PLATFORM_NAME,
        "postUrl": tweet.url
    }
    tweets.append(data)
    if i > 800:
        break

tweet_df = pd.DataFrame(tweets, columns=["id_post", "Date", "Heure", "content", "username", "likecount", "shares",
                                         "comments", "platformname", "postUrl"])
tweet_df.to_csv('tweeter.csv', sep=";", encoding='utf-8', index=False)

tweet_json = tweet_df.to_json(orient='records', indent=4, force_ascii=False)

clean_insta_json = re.sub(r"[\x00-\x1F\x7F-\x9F]", "", tweet_json)
response = jsonify(json.loads(clean_insta_json))
response.headers['Content-Type'] = 'application/json'
return response

if name == 'main': app.run(debug=True)

Screenshots and recordings

No response

Operating system

Windows 11

Python version: output of python3 --version

3.9.13

snscrape version: output of snscrape --version

snscrape-0.6.2.20230321.dev39+gc3b216c

Scraper

TwitterSearchScrapper

How are you using snscrape?

Module (import snscrape.modules.something in Python code)

Backtrace

No response

Log output

No response

Dump of locals

No response

Additional context

No response

Elsayed91 commented 1 year ago

same, my code was working fine the last 2 weeks. Been using the development version

pip3 install --upgrade git+https://github.com/JustAnotherArchivist/snscrape.git

but as of today, no kind of change is allowing me to get past the 404 block.

@op take a look at https://github.com/JustAnotherArchivist/snscrape/issues/996

germain-cyber commented 1 year ago

Same here, one day I was using simple queries and the other I was getting blocked. However, you can use Twitter API to tackle this problem but you ill reach a limit of tweets you can scrape.

My thoughts is that the owner of Twitter stopped unlimited queries in order not to let AI improve over his social network. (Many articles I read are saying that)

mrzeynalli commented 1 year ago

Same here. Has anybody managed to come up with a solution?

sfkaplan commented 1 year ago

Had the same issue. Seems snscraper not working anymore on twitter

AgungPambudi commented 1 year ago

Had the same issue. Seems snscraper not working anymore on twitter

same here

image