Leibniz-HBI / Social-Media-Observatory

This repository is the central communication and project management interface for the Social Media Observatory hosted by the Leibniz Insitute for Media Research | Hans-Bredow-Institute
https://leibniz-hbi.github.io/SMO/
Creative Commons Attribution 4.0 International
26 stars 1 forks source link

Find/develop solution to retrieve all replies to a list of tweets #25

Closed FlxVctr closed 3 years ago

FlxVctr commented 4 years ago

e.g. based on this:

https://gist.github.com/edsu/54e6f7d63df3866a87a15aed17b51eaf

or simply via scraping.

Got asked for a solution to this quite a few times this year.

manilevian commented 4 years ago

Do I understand correctly, they want to get reply's from a fixed set of comment ID's or do they want to specify a user? I dont really understand where they want to start. In Theory a scraper would probably be almost as time intense as doing this over the API. I really would rather go the "scrape way" (maybe because i'm getting more familiar with standard scraping).

I could probably code a scraper where you could specify a list of usersnames/userlinks, that the scraper then will visit and scrape their comments+replies. Then it would export those to whatever file format that is wanted. Should be pretty solid with selenium. But it would definitely take more time than a instagram scraper (The sourcecode of a twitter site is a bit more complex, and finding the right spots seems to be time consuming)

FlxVctr commented 4 years ago

To a specific user would be easy. The other thing is more challenging. That's why people are asking for a solution. We don't have to prioritise this now. It is just something I wanted to note for later.

I'd be more comfortable with the API approach, which is actually pretty easy. But we can also decide later to do both. In any case, this does not have to happen before our "Twitter Semester".

FlxVctr commented 4 years ago

You would start, e.g., with a keyword search, and then you want to get all replies and replies to replies to the resulting Tweets.

manilevian commented 4 years ago

So probably the best way is a hashtag search > fetch those comments > fetch their replies > option to ecxtract into one or multiple files! Atm i dont see the time for doing it, as we got alot to do with the wiki and more coming in 2020. But if i catch some spare time i will be very pleased to get into the issue and the twitter sphere :)

FlxVctr commented 4 years ago

This should solve the problem: https://twittercommunity.com/t/new-conversation-id-field-and-operator-makes-it-possible-to-easily-retrieve-complete-conversation-threads/139613