DocNow / hydrator

Turn Tweet IDs into Twitter JSON & CSV from your desktop!
MIT License
428 stars 62 forks source link

Reconstructing Threads #107

Open mcolemann opened 3 years ago

mcolemann commented 3 years ago

Hi everyone!

Does someone of you have a code to reconstruct the threads?

Thanks a lot!

edsu commented 3 years ago

Hydrator doesn't do that currently, but that would be a good option to add potentially. If you need to do it now you can use twarc's conversation command.

mcolemann commented 3 years ago

Thank you very much! Is twarc compatible with the .csv files from the Hydrator?

edsu commented 3 years ago

It depends on what you are doing. Do you want to get all the conversation threads in the CSV file you generated?

mcolemann commented 3 years ago

Actually I am trying to identify some good case studies for the conversation threads. I have 33 .csv files (each around 300-500MB) and I would like to reconstruct all the threads (to better identify the case studies). Do you know how I can do this?

edsu commented 3 years ago

Do you have access to the Academic Research product track, which allows searching the historical archive?

In theory it ought to be possible if you extract the tweet ids from your CSVs into a file e.g ids.txt. And then run twarc conversations ids.txt --archive conversations.json to collect all the threads. It could take a while depending on the sizes of the threads you encounter. But these are all questions for the twarc issue tracker I guess.

mcolemann commented 3 years ago

Unfortunately I don't have (I think so) access to the Academic Research product track... How can I have access to it?

Thanks a lot! Then I will open an issue for twarc.

edsu commented 3 years ago

If you are studying or working at a university you can apply. The main difference is that you can access 10 million tweets a month from Twitter's V2 API (usually limited to 100,000/month). The V2 API includes things like reply_count for tweets, as well as the conversation_id for a tweet which lets you easily collect all the tweets in a thread. And most importantly, Academic Research track lets you search the full archive of tweets rather than just the last week.