Open JustAnotherArchivist opened 1 year ago
So sad :-( My research project is strongly related to this lib, and pay tribute to your effort in maintaining this.
Twitter disabled their public web site today (2023-06-30) and require users to login, twitter used to be public prior to this date. Would it be possible to automate the login as well providing a username and pw to snscrape, i.e. before calling a graphql api to login to twitter and simulate a logged-in session?
I do not think the developer would do this, as he said that auth would never be added into features: see #270 . Let's see what our great developers' solution, hope it would not take long.
Before using this library, I had started doing manual scrapping myself using Puppeteer and I had automated the sign in part (even through 2FA). The issue is that if you frequently sign in in a small period of time you get blocked by Twitter and you cannot sign in again for a certain amount of time. So I'm not sure what the ideal setup would be in this case...
If this comment is off-topic, please consider deleting it. Uh. It was mentioning Twitter failing in this regard, not you. btw.
Please consider deleting my prior off-topic comment.
Don't nuke this one as off-topic: A Twitter employee says it's temporary:
https://twitter.com/AqueelMiq/status/1674843555486134272 "this is a temporary restriction, we will re-enable logged out twitter access in the near future"
Elon talked about it too 💀 https://twitter.com/elonmusk/status/1674942336583757825
can i use my personal oauth key to twitter snscrape ?
Elon talked about it too 💀 https://twitter.com/elonmusk/status/1674942336583757825
Musk referred to EXTREME scraping, indicating that scrapers may no longer be functional post changes. Let's see how it is done.
can i edited the "twitter.py" modules w/ my own bearer key or event oauth login key? (locally, at my computer when i installed snscraper module) since it change to my local snscraper module ? thanks
Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web/*/https://twitter.com/tesla/status*)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben
Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben
URL: https://cdn.syndication.twimg.com/tweet-result
CODE:
import requests
url = "https://cdn.syndication.twimg.com/tweet-result"
querystring = {"id":"1652193613223436289","lang":"en"}
payload = ""
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Origin": "https://platform.twitter.com",
"Connection": "keep-alive",
"Referer": "https://platform.twitter.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "cross-site",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"TE": "trailers"
}
response = requests.request("GET", url, data=payload, headers=headers, params=querystring)
print(response.text)
Generated by Insomnia
Hello, This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505 Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status) If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure. Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated Ben
URL: https://cdn.syndication.twimg.com/tweet-result
CODE:
import requests url = "https://cdn.syndication.twimg.com/tweet-result" querystring = {"id":"1652193613223436289","lang":"en"} payload = "" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate, br", "Origin": "https://platform.twitter.com", "Connection": "keep-alive", "Referer": "https://platform.twitter.com/", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "cross-site", "Pragma": "no-cache", "Cache-Control": "no-cache", "TE": "trailers" } response = requests.request("GET", url, data=payload, headers=headers, params=querystring) print(response.text)
Generated by Insomnia
This seems to be working, the problem might be the rate limit and stability, more tests are needed.
It does not allow you to see all the followed by a user either, would there be a solution for that? they help me?
https://twitter.com/elonmusk/status/1675187969420828672
😂
@elonmusk To address extreme levels of data scraping & system manipulation, we’ve applied the following temporary limits:
- Verified accounts are limited to reading 6000 posts/day
- Unverified accounts to 600 posts/day
- New unverified accounts to 300/day
My IP was banned although I was using a proxy that change the IP dynamically, what options we have now?
@JustAnotherArchivist Are the scrapers working anytime soon? Also, I want to thank you for your hard work on these scrapers.
Scraping seems to be still possible, check this:
Scraping seems to be still possible, check this:
while cool, it's using API V1 and you can't get long tweet
hi guys im new to github and coding but maybe this is helpful
hi guys im new to github and coding but maybe this is helpful
This doesn't work since a long time ago.
what about using Selenium first to make a login after that use Sntwitter to get tweets? the question here is how can link between Selenium session with Sntwitter?
hi guys im new to github and coding but maybe this is helpful https://twitter.com/iam4x/status/1675194767854956546?s=20
This doesn't work since a long time ago.
lol this seems to be working, na never mind, besides it was fun for some minutes, it messes up the rest of the features so no lol after all
what about using Selenium first to make a login after that use Sntwitter to get tweets? the question here is how can link between Selenium session with Sntwitter?
The beauty ofsnscrape
is that it doesn't require authentication, if we're going to have to start using login/auth and tools like Selenium then it should be spun off into another project and not snscrape. Also using any form of auth gives twitter another way to ban mass collection which is the use case for many users of snscrape.
Hello, This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505 Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status) If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure. Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated Ben
URL: https://cdn.syndication.twimg.com/tweet-result
CODE:
import requests url = "https://cdn.syndication.twimg.com/tweet-result" querystring = {"id":"1652193613223436289","lang":"en"} payload = "" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate, br", "Origin": "https://platform.twitter.com", "Connection": "keep-alive", "Referer": "https://platform.twitter.com/", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "cross-site", "Pragma": "no-cache", "Cache-Control": "no-cache", "TE": "trailers" } response = requests.request("GET", url, data=payload, headers=headers, params=querystring) print(response.text)
Generated by Insomnia
Hi! :)) It works great! Is there perhaps any way to scrape repost and comment data as well? I need a mapping of twitt spread for my master thesis, but what companies are doing lately with their API (like Twitter or Reddit) is terrible....
Hello, This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505 Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status) If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure. Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated Ben
URL: https://cdn.syndication.twimg.com/tweet-result CODE:
import requests url = "https://cdn.syndication.twimg.com/tweet-result" querystring = {"id":"1652193613223436289","lang":"en"} payload = "" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate, br", "Origin": "https://platform.twitter.com", "Connection": "keep-alive", "Referer": "https://platform.twitter.com/", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "cross-site", "Pragma": "no-cache", "Cache-Control": "no-cache", "TE": "trailers" } response = requests.request("GET", url, data=payload, headers=headers, params=querystring) print(response.text)
Generated by Insomnia
Hi! :)) It works great! Is there perhaps any way to scrape repost and comment data as well? I need a mapping of twitt spread for my master thesis, but what companies are doing lately with their API (like Twitter or Reddit) is terrible....
You are describing my situation now I need the comments for the same purpose please let me know when you find a solution my submission in September
what about using Selenium first to make a login after that use Sntwitter to get tweets? the question here is how can link between Selenium session with Sntwitter?
The beauty of
snscrape
is that it doesn't require authentication, if we're going to have to start using login/auth and tools like Selenium then it should be spun off into another project and not snscrape. Also using any form of auth gives twitter another way to ban mass collection which is the use case for many users of snscrape.
So you would rather have it completely stop working for all other use cases as well?
@IrtzaShahan #270
Would be great if snscrape would add a new function like TwitterProfileScraperSyn that grabs the tweet data from the still publicly available syndication profile feeds. The sny feed shows 20 tweets with is good for many applications.
Insomnia
Great!
Is there any other param I can put in querystring except the tweet id? I want to get tweets for specific users, but can't find what params should I use.
what about using Selenium first to make a login after that use Sntwitter to get tweets? the question here is how can link between Selenium session with Sntwitter?
The beauty of
snscrape
is that it doesn't require authentication, if we're going to have to start using login/auth and tools like Selenium then it should be spun off into another project and not snscrape. Also using any form of auth gives twitter another way to ban mass collection which is the use case for many users of snscrape.So you would rather have it completely stop working for all other use cases as well?
Yes (for twitter), and I expressed why and so has JustAnotherArchivist#issuecomment-1616774736 / #270
May I please ask how we can have a specific user's tweets from the start time to the end time for now? Really in a hurry and currently have no clues....
And this one seems to have no params for screen name? Do we have other urls? https://cdn.syndication.twimg.com/tweet-result
Thank you for all your help, and many great praise to the author @JustAnotherArchivist
Broke by Musk
i hope a solution would be found soon i really need this libs its for my final studies project otherwise i could fail...
Does anyone know if someone's working on a snscraper fork that implements login/auth for Twitter?
Really appreciate your work, JustAnotherArchivist, thank you for all you do. Hoping Elon pulls back some of the restriction and we can have snscrape working as original! Best wishes
@pleblira this library uses the SNScrape classes for User and Tweet and supports auth https://github.com/vladkens/twscrape
Hello, This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505 Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status) If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure. Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated Ben
URL: https://cdn.syndication.twimg.com/tweet-result
CODE:
import requests url = "https://cdn.syndication.twimg.com/tweet-result" querystring = {"id":"1652193613223436289","lang":"en"} payload = "" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate, br", "Origin": "https://platform.twitter.com", "Connection": "keep-alive", "Referer": "https://platform.twitter.com/", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "cross-site", "Pragma": "no-cache", "Cache-Control": "no-cache", "TE": "trailers" } response = requests.request("GET", url, data=payload, headers=headers, params=querystring) print(response.text)
Generated by Insomnia
What's the URL to see the user profile? Sorry if it's a dumb question, but I could not find any reference on the net.
@nbrahmani You can try
https://syndication.twitter.com/srv/timeline-profile/screen-name/[username]
For example, https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk
User info will be stored inside the <script id="__NEXT_DATA__>
tag. The tag itself is server-side rendered so you can use requests
with BeautifulSoup
(assume using Python) to extract the data you need. You can get the user profile and up to 20 most recent tweets from that user.
Unfortunately this endpoint has been dead since 16, 17 hours ago.
@nickchen120235
I tried this, but I need the twitter blue status of a user, and this does not return that.
@nickchen120235
I tried this, but I need the twitter blue status of a user, and this does not return that.
There's a boolean is_blue_verified
or something similar in the user
key iirc. Maybe that's what you need?
@nickchen120235 I tried this, but I need the twitter blue status of a user, and this does not return that.
There's a boolean
is_blue_verified
or something similar in theuser
key iirc. Maybe that's what you need?
As far as I can see, it does not have that boolean. I get the following output:
{"props":{"pageProps":{"contextProvider":{"features":{},"scribeData":{"client_version":null,"dnt":false,"widget_id":"embed-0","widget_origin":"","widget_frame":"","widget_partner":"","widget_site_screen_name":"","widget_site_user_id":"","widget_creator_screen_name":"","widget_creator_user_id":"","widget_iframe_version":"bb06567:1687853948269","widget_data_source":"screen-name:elonmusk","session_id":""},"messengerContext":{"embedId":"embed-0"},"hasResults":true,"lang":"en","theme":"light"},"lang":"en","maxHeight":null,"showHeader":true,"hideBorder":false,"hideFooter":false,"hideScrollBar":false,"transparent":false,"timeline":{"entries":[]},"headerProps":{"screenName":"elonmusk"}},"__N_SSP":true},"page":"/timeline-profile/screen-name/[screenName]","query":{"screenName":"elonmusk"},"buildId":"vn5fUacsNpP-nIkFRlFf6","assetPrefix":"https://platform.twitter.com","isFallback":false,"gssp":true,"customServer":true}
@nbrahmani Sorry for the confusion 😓
As I mentioned earlier this endpoint is dead, so it's no longer outputting the correct response.
If it were working, the info you need would be in the user
key in one of the entries
.
hello guys hello @JustAnotherArchivist any update about the issue?
AFAIK
Hello, This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505 Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status) If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure. Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated Ben
URL: https://cdn.syndication.twimg.com/tweet-result
CODE:
import requests url = "https://cdn.syndication.twimg.com/tweet-result" querystring = {"id":"1652193613223436289","lang":"en"} payload = "" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate, br", "Origin": "https://platform.twitter.com", "Connection": "keep-alive", "Referer": "https://platform.twitter.com/", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "cross-site", "Pragma": "no-cache", "Cache-Control": "no-cache", "TE": "trailers" } response = requests.request("GET", url, data=payload, headers=headers, params=querystring) print(response.text)
Generated by Insomnia
Can we get the IDs of the post generated by a specific profile? If a single embedded tweet is working, a for-loop through all the IDs will work in the interim. Thank you!
what is this code for ? @prasunshrestha
As of now it seems to be possible to view public tweets without logging in. Wayback Machine can save tweet pages again.
Current snscrape
scraping methods still return 404
, so it's likely that API endpoints or something else has been changed.
Can't confirm anything more than that for now.
Yes, it is a different endpoint which only returns the single requested tweet, no replies or the replied-to tweet.
Yes, it is a different endpoint which only returns the single requested tweet, no replies or the replied-to tweet.
is it already implemented? if yes wich version should i update or have then
With the exception of
twitter-trends
, all Twitter scrapes are failing since sometime in the past hour. This is likely connected to Twitter as a whole getting locked behind a login wall since earlier today. There is no known workaround at this time, and it's not known whether this will be fixable.