DocNow / hydrator

Turn Tweet IDs into Twitter JSON & CSV from your desktop!
MIT License
434 stars 64 forks source link

Tweet id file error #36

Closed gskarp closed 4 years ago

gskarp commented 4 years ago

While trying to start hydrating a txt file with only the tweet ids, I get the following message

image
edsu commented 4 years ago

What does the first line of your file corona_tweets_01.txt look like?

gskarp commented 4 years ago
image
edsu commented 4 years ago

Hmmm, where did you get that from? It looks like the id file has been opened with Excel or something that truncated all the IDs (see how they all end in zero?).

Are you sure there isn't something else on line 1? That error gets thrown when it finds a line without a number on it: https://github.com/DocNow/hydrator/blob/master/app/utils/twitter.js#L15

If you are able to upload the id file here or send it to me via email ehs@pobox.com I can try to debug further.

gskarp commented 4 years ago

Yes, I actually opened it in excel to erase a second column, which produced the same error. I originally downloaded the csv files from here: https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset

gskarp commented 4 years ago

I realize that there was a mistake in the way I used excel. In csv things are completely different

edsu commented 4 years ago

Yes, unfortunately Excel is known to mangle numbers and dates. Definitely be wary of it. I would install csvkit and then cut out the column you want into a new file.

csvcut -c tweet_id corona_tweets_01.csv > coronoa_tweet_ids_01.csv

We've talked about adding functionality to allow people to load tweet ids from arbitrary CSV files by having people select the column. But until that's available I'm afraid you will need to select out the data in some other way.

edsu commented 4 years ago

I'm going to close this, but please reopen if it doesn't seem resolved.