Number of hydrated tweets vastly smaller than rows in resulting CSV

DocNow / hydrator

Turn Tweet IDs into Twitter JSON & CSV from your desktop!

MIT License

428 stars 62 forks source link

Number of hydrated tweets vastly smaller than rows in resulting CSV #110

Closed soursunrise closed 2 years ago

soursunrise commented 2 years ago

Hi,

I have recently hydrated 8,742,424 tweet IDs, the output shows 5,550,148 tweets extracted (deletion rate was high). However, when I load the resulting CSV in R, it shows 10,495,674 rows/tweets. How can this be?

Thanks for your answer!

igorbrigadir commented 2 years ago

How exactly are you loading the file in R?

Someone else on on the docnow slack had an issue with R but it was resolved when using tidyverse as opposed to read.csv()

soursunrise commented 2 years ago

I've been using read_csv, yes, but it's part of tidyverse, as far as I understand? What would you suggest?

edsu commented 2 years ago

I'm curious does each row in the DataFrame have an id?

soursunrise commented 2 years ago

Turns out that it was a divider problem in the write.csv2() which was then incorrectly uploaded. Now the number makes sense - still a bit smaller than given by Hydrator, but I guess that was already looked at in another thread and is not differing much in any case