Closed helloqwerasdf closed 4 years ago
I don't know what you mean. Are you talking about retweets?
@JustAnotherArchivist yes, I am talking about retweets~sorry for my poor English~
No problem.
Retweets are very tricky to scrape due to how Twitter works. The search does not return them at all by default, and you only get retweets from the past 7 days if you enable it (#8). The alternative is the profile page, which includes retweets but it only returns about 3200 tweets. So the best you can do is use the twitter-profile
scraper to at least discover all retweets among the user's 3200 most recent tweets. (twitter-profile
was broken earlier, but I just fixed that in 8cf81e9b, so make sure you update before trying this.)
It is impossible (to my knowledge) to discover retweets of a target user's tweets. So if you have some user and want to find all retweets referencing a tweet of that user, that won't work.
@JustAnotherArchivist thanks a lot! Due to your advice, the problem is solved!
Hello, I do not want to create a duplicate therefore I'll try to make sure I understood correctly. There is currently no way to scrape the retweets and replies of a specific user when using snscrape ? (I tried including 'from:user include:nativeretweets', using the dev version of the package) It is strange though, since GOT3 was able to do it before the last twitter update completely obliterated it... :/
The two options for retweets are:
snscrape twitter-search 'from:username include:nativeretweets'
– This only works for retweets from the past 7 days (and only returns normal tweets further back).snscrape twitter-profile username
– This only returns the ~3200 most recent tweets, including retweets among those (which may go back further than 7 days).I am not aware of any way to get retweets beyond these two methods. GetOldTweets3 seems to have used the (old design) web search just like snscrape does, so it should have had the same 7-day limitation.
Replies are normal tweets and extracted with the standard twitter-user
scraper or the equivalent twitter-search from:username
. twitter-profile
also returns them but with the same limitation as above.
Thanks a lot ! I understand better ! One last thing, what could be the reason for the following error ? snscrape: error: unrecognized arguments: include:nativeretweets' 'from:(username I am scraping)'
tweet_count = 100
username = "XXX"
os.system("snscrape --jsonl --max-results {} twitter-search 'from:username include:nativeretweets' 'from:{}'> user-tweets.json".format(tweet_count, username))
tweets_df1 = pd.read_json('user-tweets.json', lines=True)
tweets_df1.to_csv('user-tweets.csv', sep=',', index=False)
You're passing too many arguments. twitter-search
just takes one argument, the query, but you're passing two. The error also indicates that argument splitting doesn't work the way you think it does. I haven't used os.system
in a very long time though; the proper way is subprocess
, e.g.
with open('user-tweets.json', 'wb') as fp:
subprocess.run(['snscrape', '--jsonl', '--max-results', str(tweet_count), 'twitter-search', f'from:{username} include:nativeretweets'], stdout = fp)
Or just using capture_output and then directly feeding the output into Pandas instead of going through a file (not sure if that's possible).
Thank you very much for your help. I do not really understand it still, but it worked like a charm :)
So it possible to scrape the number of times a tweet has been liked and retweeted and the content of the retweet?
The two options for retweets are:
snscrape twitter-search 'from:username include:nativeretweets'
– This only works for retweets from the past 7 days (and only returns normal tweets further back).snscrape twitter-profile username
– This only returns the ~3200 most recent tweets, including retweets among those (which may go back further than 7 days).I am not aware of any way to get retweets beyond these two methods. GetOldTweets3 seems to have used the (old design) web search just like snscrape does, so it should have had the same 7-day limitation.
Replies are normal tweets and extracted with the standard
twitter-user
scraper or the equivalenttwitter-search from:username
.twitter-profile
also returns them but with the same limitation as above.
Hi, would you mind sharing how I use twitter-profile to scrape 3,200 tweets using Python Wrapper
@xmainguyen I assume you're asking about using the profile scraper from a Python script (instead of the CLI).
import snscrape.modules.twitter.
for tweet in snscrape.modules.twitter.TwitterProfileScraper('username').get_items():
# Do something with the tweet object, e.g.
print(tweet.url)
I can scrape retweets now using tweet.retweetedTweet.content and it works just fine. However I didn't test it on a large number of tweets.
Is there a way to find liked tweets of a user? I guess liked tweets is same as favourite tweets as mentioned by previous user.
Hello! I find the result of "snscrape.modules.twitter.TwitterUserScraper(username='textfiles')" only contains the tweets of the user self, but I need the push from the user as well. How can I get it?