Closed JustAnotherArchivist closed 1 year ago
this is strange...'top=True' is returning results on my desktop, but isn't working on my raspberry pi. not sure what's going on lol
Interesting. I tested from several machines all over the globe, and it failed in the same way everywhere.
Weirdly enough, running the request including all cookies, headers, and params from the browser through python's request.get returns a 404 with a "Sorry, that page does not exist" message, but running it through node.js axios yields a 200 with the expected response. I don't know what discontinuities there are between the two, but that seems to be a limiting factor.
@brihuang99 What snscrape version do you have on your desktop?
Interesting. I tested from several machines all over the globe, and it failed in the same way everywhere.
What about just using the twitter search bar with since and until commands ?
Okey you won't get it in a chronoligical way but atleast you get the last 1D tweets or 7d tweets.
@Ramizworking This issue is about snscrape, not Twitter's web interface. I'm aware that they didn't remove the 'top' search, but snscrape can't emulate it currently.
@brihuang99 What snscrape version do you have on your desktop?
I'm running snscrape 0.4.3.20220106. It's the same as what I have on my Raspberry Pi, not sure why only my desktop is working
@Ramizworking This issue is about snscrape, not Twitter's web interface. I'm aware that they didn't remove the 'top' search, but snscrape can't emulate it currently.
ITS BACK
but it says "your account is not able to perform this action"
@brihuang99 What are your Python and OpenSSL versions?
@brihuang99 What are your Python and OpenSSL versions?
Python 3.10.6 OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)
@Ramizworking This issue is about snscrape, not Twitter's web interface. I'm aware that they didn't remove the 'top' search, but snscrape can't emulate it currently.
Unfortunately, I refreshed the page and yeah they removed it again. Oh okey ! I thought it was due to their recent fcking update, I don't know if you seen it but they changed "Home" and 'Latest" with 'For you" and "Following"
I'm seeing the "from:/2/search/adaptive.json
path get removed at Twitter? I don't see it in the documentation anywhere. Seeing a 404 failure on that request here.
Retrieved https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&send_error_codes=true&simple_quoted_tweets=true&q=from%3AAdyen&tweet_search_mode=live&count=100&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&ext=mediaStats%2ChighlightedLabel: 404
@crscheid Yes, the 404 is what this issue's about, although you're not using the 'top' order.
I believe I added top=True
after the query field in my call with same result. I’ll check again when I get in front of the code again.
Either way looks like the issue is well known here. I’ll follow the issue for any updates. Thanks!
marked as off topic
I realize the issue is regarding the twitter-search module but if people are having issues pulling tweets from a users account I've been able to use the CLI tool to pull back tweets from a users account via the 'twitter-profile' submodule. Like may others the search module fails, so any kind of text search does not work and returns the same errors as everyone else. However,
snscrape --jsonl --progress "twitter-profile" "some_user_account"
pulls the tweets from an account in reverse chronological order. I get a return of 3243 tweets. Looking at the web though that user has about 3900 tweets. I'm aware that the v2 api returns 3200 tweets from a users profile so I'm not sure if I added any value here. The particular account goes back to 2015, and the earliest tweets I got were from 2017.
I usually use SNscrape scrape to just grab the ID's and then use the tweet ids with the api to get more detailed info. For instance, the 'impression_count' is returned in the public metrics now, so that is returned where applicable.
Def a bummer the text search isn't working, but for people trying to get tweets from a user account this seems to be working atm.
how can I run this in python or jupyter notebook
I don't use snscrape, but I encountered the same problem and I think I've figured out the issue: Twitter's API dropped support for HTTP/1.1. You need to use HTTP/2 or up, but requests (which snscrape uses) doesn't support it. You can switch to httpx instead, which supports HTTP/2 with requests-like interface.
In my code, I just had to change import requests
to import httpx
and requests.get
to httpx.get
. After that, my API calls started working again.
@laymonage They definitely still accept HTTP/1.1. Instead, it has to do with TLS fingerprinting. Cf. @AntoinePaix's comment in #648. httpx
probably uses different ciphers by default.
Is there any update on #634, or my issue request? What happened?
It works for me (with top=True). Linux, Windows. Machines in different parts of the world.
@Mr-Freewan Which OpenSSL version?
@Mr-Freewan Which OpenSSL version?
Old =) 1.1.1f
I was also testing with an old 1.1.1 earlier without success. It likely depends on the exact cipher selection, which has changed several times even within the 1.1.1 series I think.
I have an implementation of @AntoinePaix's suggestion that works on my system, but I want to test different combinations of Python and OpenSSL to make sure it's reliable before committing. Soon™...
waiting for this =)
Me too, is there any update on #650
The OpenSSL testing is more complicated than I expected. I've made decent progress, but it'll have to wait a bit longer. The good news is that this can all be reused for the test suite in the future.
@JustAnotherArchivist, We are all counting on you. And thanks for all the support you have provided everyone here. Kudos!!
Hi, because twitter removed "Top" tab, some params/settings of requests become invalid. Removing following settings worked for me (in the twitter.py)
super().init(baseUrl = 'https://twitter.com/search?src=typed_query&' + urllib.parse.urlencode({'f': 'live', 'lang': 'en', 'q': query, 'sr
What do you think @JustAnotherArchivist ?
@msamancioglu, where should I specify these parameters in the following query?
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('COVID since:2021-05-01 until:2021-05-31').get_items()):
It works for me (with top=True). Linux, Windows. Machines in different parts of the world.
How did it worked
@msamancioglu, where should I specify these parameters in the following query?
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('COVID since:2021-05-01 until:2021-05-31').get_items()):
did you get a solution now????
The OpenSSL testing is more complicated than I expected. I've made decent progress, but it'll have to wait a bit longer. The good news is that this can all be reused for the test suite in the future.
Any way we can help / give back? I'll contact my local Congress person even!
@SomKen Good luck with getting them to understand what 'OpenSSL' or 'test suite' even mean. ;-)
@SomKen Good luck with getting them to understand what 'OpenSSL' or 'test suite' even mean. ;-)
Boss hope you doing well! How are the test going so far ?
Have a nice day
I'm also waiting for a solution. Thanks!!
open any solution? good afternoon Boss
Hey guys, I haven't made any changes on my code since 2 days ago even since I retry to add the top=True argument but still does not work, but I don't understand why snscrape is working for now, did the @JustAnotherArchivist make the latest changes or Elon Musk (Twitter CEO) rollback from their decision? Haven't tried it on my other machine yet.
But on raw JSON data, it looks like never before, I can't do formatting in https://jsongrid.com/json-formatter for readability such as error "bad string".
But awesome work for the whole progress!
Here the code
import snscrape.modules.twitter as sntwitter
import datetime
import json
date_start = str(datetime.datetime.today().date() - datetime.timedelta(days=1))
hashtags = "solana"
scraper = sntwitter.TwitterSearchScraper(f"#{hashtags} since:{date_start}").get_items()
for i_scraping, tweet_data in enumerate(scraper):
tweet = json.loads(tweet_data.json())
print(tweet)
if i_scraping >= 5:
break
{'_type': 'snscrape.modules.twitter.Tweet', 'url': 'https://twitter.com/NFTTalkShow/status/1613640500153815040', 'date': '2023-01-12T20:54:09+00:00', 'rawContent': 'What are the top nft marketplaces and what blockchains do they cater to?\n\n#Web3 #cryptocurrency #nft #nftart #NFTCommunity #ethereum #solana #tezos #4u #opensea #rarible #magiceden https://t.co/wfddUS3Lmb', 'renderedContent': 'What are the top nft marketplaces and what blockchains do they cater to?\n\n#Web3 #cryptocurrency #nft #nftart #NFTCommunity #ethereum #solana #tezos #4u #opensea #rarible #magiceden https://t.co/wfddUS3Lmb', 'id': 1613640500153815040, 'user': {'_type': 'snscrape.modules.twitter.User', 'username': 'NFTTalkShow', 'id': 1428513977634463746, 'displayname': 'NFT Talk Show 🎙️Podcast', 'rawDescription': 'Ranked Top 5 Podcasts for Web3, NFTs and Crypto. Available on Apple, Spotify & more. Mint our community token - https://t.co/ewIvM4aQC7', 'renderedDescription': 'Ranked Top 5 Podcasts for Web3, NFTs and Crypto. Available on Apple, Spotify & more. Mint our community token - app.manifold.xyz/c/nfttalkshow', 'descriptionLinks': [{'_type': 'snscrape.modules.twitter.TextLink', 'text': 'app.manifold.xyz/c/nfttalkshow', 'url': 'https://app.manifold.xyz/c/nfttalkshow', 'tcourl': 'https://t.co/ewIvM4aQC7', 'indices': [112, 135]}], 'verified': False, 'created': '2021-08-20T00:27:36+00:00', 'followersCount': 438, 'friendsCount': 47, 'statusesCount': 542, 'favouritesCount': 165, 'listedCount': 7, 'mediaCount': 36, 'location': 'Metaverse', 'protected': False, 'link': {'_type': 'snscrape.modules.twitter.TextLink', 'text': 'nfttalkshow.com', 'url': 'http://nfttalkshow.com', 'tcourl': 'https://t.co/0C0CwxaG4a', 'indices': [0, 23]}, 'profileImageUrl': 'https://pbs.twimg.com/profile_images/1609408253460647937/eeBueATk_normal.jpg', 'profileBannerUrl': 'https://pbs.twimg.com/profile_banners/1428513977634463746/1672548154', 'label': None, 'url': 'https://twitter.com/NFTTalkShow'}, 'replyCount': 0, 'retweetCount': 0, 'likeCount': 0, 'quoteCount': 0, 'conversationId': 1613640500153815040, 'lang': 'en', 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'sourceUrl': 'http://twitter.com/download/iphone', 'sourceLabel': 'Twitter for iPhone', 'links': None, 'media': [{'_type': 'snscrape.modules.twitter.Video', 'thumbnailUrl': 'https://pbs.twimg.com/ext_tw_video_thumb/1613640338580844545/pu/img/QrEklq5Xx6Zn4V78.jpg', 'variants': [{'_type': 'snscrape.modules.twitter.VideoVariant', 'contentType': 'video/mp4', 'url': 'https://video.twimg.com/ext_tw_video/1613640338580844545/pu/vid/480x852/LHyklatMYB91E_g-.mp4?tag=12', 'bitrate': 950000}, {'_type': 'snscrape.modules.twitter.VideoVariant', 'contentType': 'video/mp4', 'url': 'https://video.twimg.com/ext_tw_video/1613640338580844545/pu/vid/320x568/scm1d3eJ-crkPppU.mp4?tag=12', 'bitrate': 632000}, {'_type': 'snscrape.modules.twitter.VideoVariant', 'contentType': 'video/mp4', 'url': 'https://video.twimg.com/ext_tw_video/1613640338580844545/pu/vid/720x1280/3vMWEXgMttsBa9dV.mp4?tag=12', 'bitrate': 2176000}, {'_type': 'snscrape.modules.twitter.VideoVariant', 'contentType': 'application/x-mpegURL', 'url': 'https://video.twimg.com/ext_tw_video/1613640338580844545/pu/pl/X7HLC2ovv3Lq73Bx.m3u8?tag=12&container=fmp4', 'bitrate': None}], 'duration': 46.433, 'views': 0}], 'retweetedTweet': None, 'quotedTweet': None, 'inReplyToTweetId': None, 'inReplyToUser': None, 'mentionedUsers': None, 'coordinates': None, 'place': None, 'hashtags': ['Web3', 'cryptocurrency', 'nft', 'nftart', 'NFTCommunity', 'ethereum', 'solana', 'tezos', '4u', 'opensea', 'rarible', 'magiceden'], 'cashtags': None, 'card': None}
https://twitter.com/somken/status/1613658821163094017/photo/1
Pretty sure Twitter fixed something... I haven't touched anything
https://twitter.com/somken/status/1613658821163094017/photo/1
Pretty sure Twitter fixed something... I haven't touched anything
Still not working from my side ser !
Can't reproduce that. Without the TLS cipher override, it still fails the same way as before. If you are talking about the actual official API, snscrape doesn't use that.
Can't reproduce that. Without the TLS cipher override, it still fails the same way as before. If you are talking about the actual official API, snscrape doesn't use that.
Yeah not really related to the acutal problem. On our side, how's it going on boss ?
Can't reproduce that. Without the TLS cipher override, it still fails the same way as before. If you are talking about the actual official API, snscrape doesn't use that.
I'm using "3.0.2-0ubuntu1.7" in a Ubuntu 22.04 container. I'm also using an old version in PIP "snscrape==0.4.3.20220106" but everything is back to normal for me. I'm seeing the same amount of tweets as I did before coming in.
Still not sure why I had a huge spike in tweets coming in before the API died, but I haven't looked into the tweets returned yet
Same here. Copying the full request's headers, cookies, params and running through httpx gives me a normal response with a guest session.
Right, so it's the OpenSSL version/cipher config thing already mentioned early on here, most likely.
I use google colab to scrape twitter and it works on my end and @brihuang99 also commented
I'm running snscrape 0.4.3.20220106. It's the same as what I have on my Raspberry Pi, not sure why only my desktop is working
maybe it has something to do with python version?
I'm using these versions and it currently works on my end Python 3.8.16 snscrape-0.4.3.20220106
weird - this still does not work for me
root@c39d9e04385c:/# pip freeze
beautifulsoup4==4.11.1
certifi==2022.12.7
charset-normalizer==2.1.1
DateTime==4.7
dbus-python==1.2.18
idna==3.4
lxml==4.9.2
pika==1.3.0
PyGObject==3.42.1
PySocks==1.7.1
pytz==2022.7
PyYAML==6.0
requests==2.28.1
snscrape==0.4.3.20220106
soupsieve==2.3.2.post1
timedelta==2020.12.3
tzdata==2022.7
urllib3==1.26.13
zope.interface==5.5.2
root@c39d9e04385c:/# python3 --version
Python 3.10.6
If this helps.
Testing has finally finished. This was a pain to get running. Looks like there aren't many people testing Python against different versions of OpenSSL...
Without extra measures, it's all about the OpenSSL version; the Python version does not matter, and I expect none of the Python packages do either.
My patch works with all tested Python (3.8.16, 3.9.16, 3.10.9, and 3.11.1) and OpenSSL (above) versions, excluding incompatible combinations of course (e.g. Python 3.10+ requires OpenSSL 1.1.1+).
Committing shortly.
Testing has finally finished. This was a pain to get running. Looks like there aren't many people testing Python against different versions of OpenSSL...
Without extra measures, it's all about the OpenSSL version; the Python version does not matter, and I expect none of the Python packages do either.
- Works: OpenSSL 3.0.7 and 1.1.0l
- Does not work: OpenSSL 1.1.1q and 1.0.2u
My patch works with all tested Python (3.8.16, 3.9.16, 3.10.9, and 3.11.1) and OpenSSL (above) versions, excluding incompatible combinations of course (e.g. Python 3.10+ requires OpenSSL 1.1.1+).
Committing shortly.
Still not working
Since sometime today, 'top' searches fail entirely. The API endpoint returns 404.