Run Hydrator for multiple CSV files

DocNow / hydrator

Turn Tweet IDs into Twitter JSON & CSV from your desktop!

MIT License

434 stars 64 forks source link

Run Hydrator for multiple CSV files #94

Open grrigore opened 3 years ago

grrigore commented 3 years ago

I have a folder with 100 .csv files of different sizes. Is there a way to hydrate those files without manually adding each file into the Hydrator app?

edsu commented 3 years ago

Do your CSV files only contain a column of numbers? Or do they include other columns as well? Also what operating system are you using?

grrigore commented 3 years ago

My .csv files contain tweet's ID.I am using Ubuntu.

edsu commented 3 years ago

Do the CSV files have a column header? Or are the files just lines of numbers?

grrigore commented 3 years ago

This is a preview from a .csv file: ID, TextBlob score (I can remove this)

1385449730818285569,0.125
1385449730981842946,0
1385449730981957635,-0.0062500000000000056
1385449730948288516,0.26666666666666666
1385449731132989440,-0.016666666666666677
1385449731086708736,0
1385449731267178496,0.3

I am using data from here

edsu commented 3 years ago

You will want to ensure that your input file is a text file where each line contains a tweet id and nothing else. So that TextBlob score will need to be removed as will any column headers.

I don't actually see data with that format in the dataset you linked to. If you are working with a very large dataset (hundreds of millions of tweets) you might want to use twarc instead of Hydrator.

edsu commented 3 years ago

Sorry i should have left this open to see if you have any more questions.

grrigore commented 3 years ago

No problem. 🙂 I think twarc it's a better tool for what I want. Thank you.