JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.45k stars 706 forks source link

How to scrape the retweets of a twitter user? #83

Closed helloqwerasdf closed 4 years ago

helloqwerasdf commented 4 years ago

Hello! I find the result of "snscrape.modules.twitter.TwitterUserScraper(username='textfiles')" only contains the tweets of the user self, but I need the push from the user as well. How can I get it?

JustAnotherArchivist commented 4 years ago

I don't know what you mean. Are you talking about retweets?

helloqwerasdf commented 4 years ago

@JustAnotherArchivist yes, I am talking about retweets~sorry for my poor English~

JustAnotherArchivist commented 4 years ago

No problem.

Retweets are very tricky to scrape due to how Twitter works. The search does not return them at all by default, and you only get retweets from the past 7 days if you enable it (#8). The alternative is the profile page, which includes retweets but it only returns about 3200 tweets. So the best you can do is use the twitter-profile scraper to at least discover all retweets among the user's 3200 most recent tweets. (twitter-profile was broken earlier, but I just fixed that in 8cf81e9b, so make sure you update before trying this.)

It is impossible (to my knowledge) to discover retweets of a target user's tweets. So if you have some user and want to find all retweets referencing a tweet of that user, that won't work.

helloqwerasdf commented 4 years ago

@JustAnotherArchivist thanks a lot! Due to your advice, the problem is solved!

DV777 commented 3 years ago

Hello, I do not want to create a duplicate therefore I'll try to make sure I understood correctly. There is currently no way to scrape the retweets and replies of a specific user when using snscrape ? (I tried including 'from:user include:nativeretweets', using the dev version of the package) It is strange though, since GOT3 was able to do it before the last twitter update completely obliterated it... :/

JustAnotherArchivist commented 3 years ago

The two options for retweets are:

I am not aware of any way to get retweets beyond these two methods. GetOldTweets3 seems to have used the (old design) web search just like snscrape does, so it should have had the same 7-day limitation.

Replies are normal tweets and extracted with the standard twitter-user scraper or the equivalent twitter-search from:username. twitter-profile also returns them but with the same limitation as above.

DV777 commented 3 years ago

Thanks a lot ! I understand better ! One last thing, what could be the reason for the following error ? snscrape: error: unrecognized arguments: include:nativeretweets' 'from:(username I am scraping)'

tweet_count = 100
username = "XXX"
os.system("snscrape --jsonl --max-results {} twitter-search 'from:username include:nativeretweets' 'from:{}'> user-tweets.json".format(tweet_count, username))
tweets_df1 = pd.read_json('user-tweets.json', lines=True)
tweets_df1.to_csv('user-tweets.csv', sep=',', index=False)
JustAnotherArchivist commented 3 years ago

You're passing too many arguments. twitter-search just takes one argument, the query, but you're passing two. The error also indicates that argument splitting doesn't work the way you think it does. I haven't used os.system in a very long time though; the proper way is subprocess, e.g.

with open('user-tweets.json', 'wb') as fp:
    subprocess.run(['snscrape', '--jsonl', '--max-results', str(tweet_count), 'twitter-search', f'from:{username} include:nativeretweets'], stdout = fp)

Or just using capture_output and then directly feeding the output into Pandas instead of going through a file (not sure if that's possible).

DV777 commented 3 years ago

Thank you very much for your help. I do not really understand it still, but it worked like a charm :)

JSMboli commented 3 years ago

So it possible to scrape the number of times a tweet has been liked and retweeted and the content of the retweet?

xmainguyen commented 3 years ago

The two options for retweets are:

  • snscrape twitter-search 'from:username include:nativeretweets' – This only works for retweets from the past 7 days (and only returns normal tweets further back).
  • snscrape twitter-profile username – This only returns the ~3200 most recent tweets, including retweets among those (which may go back further than 7 days).

I am not aware of any way to get retweets beyond these two methods. GetOldTweets3 seems to have used the (old design) web search just like snscrape does, so it should have had the same 7-day limitation.

Replies are normal tweets and extracted with the standard twitter-user scraper or the equivalent twitter-search from:username. twitter-profile also returns them but with the same limitation as above.

Hi, would you mind sharing how I use twitter-profile to scrape 3,200 tweets using Python Wrapper

JustAnotherArchivist commented 3 years ago

@xmainguyen I assume you're asking about using the profile scraper from a Python script (instead of the CLI).

import snscrape.modules.twitter.

for tweet in snscrape.modules.twitter.TwitterProfileScraper('username').get_items():
    # Do something with the tweet object, e.g.
    print(tweet.url)
tsaaii commented 2 years ago

I can scrape retweets now using tweet.retweetedTweet.content and it works just fine. However I didn't test it on a large number of tweets.

Is there a way to find liked tweets of a user? I guess liked tweets is same as favourite tweets as mentioned by previous user.