echen102 / COVID-19-TweetIDs

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020.
Other
713 stars 307 forks source link

Full version of the data #1

Closed IreneZihuiLi closed 4 years ago

IreneZihuiLi commented 4 years ago

Hi,

Is it possible to release the texts instead of the TweetIDs? I think I still need APIs to download/extract the original tweets from the web.

Thanks.

echen102 commented 4 years ago

Hi!

Unfortunately, we aren't able to publicly release the text from the Tweets that we've collected (as per Twitter's Developer's Terms of Service - see the "content redistribution" section).

However, it's fairly simple to rehydrate the TweetIDs so that you have the full text, as Hydrator provides a very intuitive GUI that manages your Twitter rate limit for you, and returns a JSON file (and can also generate a CSV file) with all of the Tweets. You'll need to apply for a Twitter Developer account to generate your own access tokens, but beyond that you shouldn't need to code anything to extract the original tweets.

echen102 commented 4 years ago

Hi,

@edsu has just added a script that will help you hydrate the tweets from command line. I'll be adding instructions to use his script in our README.

I hope you're not running into too many issues hydrating the tweet-ids - I'm going to close this issue, but please feel free to reach out to me if you are!