MartinKBeck / TwitterScraper

Repository containing all files relevant to my basic and advanced tweet scraping articles.
196 stars 117 forks source link

Filtering RTs and Replies #3

Closed jaredbach closed 3 years ago

jaredbach commented 3 years ago

Hi Martin,

Thank you so much for your Medium blog on this tool. This tool is super useful, and you did a great job describing how to use snscrape. I am just curious, do you know if you can filter retweets and replies with this module? Or if there is a way to know if the Tweet you are getting back is a RT, a reply, a part of a thread, etc.

Thanks so much in advance.

JB

MartinBeckUT commented 3 years ago

Hi Jared,

Yes, you can actually. If you look in my article https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af the section that has the screenshot of the attributes in the tweet object. There's a couple of them that share this information, specifically retweetedTweet, quotedTweet. You can check for these, if their value is none then you'll know it's an original tweet. Otherwise, if these have information here it'll show you the original tweet it's referencing.

It's been a minute but I also believe in the content field it should explicitly have RT at the beginning of it if it's a retweet.

As for a thread, you can check if there are any mentioned users this can give you an idea if it's potentially in a thread. Unfortunately, you'll have to sift through it to figure out if it's actually a thread because there's no explicit data stating that.