computermacgyver / redhen_twitter

Processing and summarising Twitter data for RedHen
1 stars 1 forks source link

File format and naming #2

Closed computermacgyver closed 5 years ago

computermacgyver commented 5 years ago

Most data on RedHen appears to be separated with pipes (|).

computermacgyver commented 5 years ago

Update. Text fields possible containing a | must be enclosed within double quotation marks to be read easily by R. We will adopt this format.

Given the size of files, we will split files into hourly bins with names in the format YYYY-MM-DD_HH00_WW_Twitter_Spritzer.twt

For compression see #4