File format and naming - Githubissues

Most data on RedHen appears to be separated with pipes (|).

For Twitter summary data, we can also use pipe to separate fields, but need to decide what to do with pipe characters that appear in tweets or free-text location fields. Should these be replaced with another character or escaped (i.e., \|)?
Currently summary files contain the text of tweets, which makes them quite large. Should they be compressed?
To date, I've used gzip compression, but it appears bz2 compression is slightly more efficient and results in smaller files. Any preference on which to use?
What should the final name of files be? I'm thinking YYYY-MM-DD_0000_WW_Twitter_Spritzer.twt . This would signify the date/time at which the data starts (roughly midnight UTC), that it is world-wide (WW), from Twitter, and a result of the Sprtizer sample. Other datasets would have a unique identifier in place of 'Spritzer'

computermacgyver / redhen_twitter