geysertimes / geysertimes-r-package

R package for accessing and analyzing the GeyserTimes database
Other
2 stars 4 forks source link

Data Column Names #4

Open spkaluzny opened 5 years ago

spkaluzny commented 5 years ago

I think we want to think about the names for the eruption data in R. The names from the tsz data file are: eruptionID geyser eruption_time_epoch has_seconds exact ns ie E A wc ini maj min q duration entrant observer eruption_comment time_updated time_entered associated_primaryID other_comments It would be good to have descriptive names with consistent character case. Similar names length would be good as well.

I realize that the data has been available for some time with the above names from the archive and I don't know if using different names in R would have any ramifications.

taltstidl commented 5 years ago

I agree, the current column names are a product of historical developments and are not quite normalized or self-explanatory. I'm including a draft of possible new names here along with a short description (coming later 😉):

While ideally we would change the column names directly within the source TSV files, I'm a bit reluctant as it might break things for people already using our archive files. I'll bring it up at our next meeting though. Also, I'll be looking into adding our parsed durations (a numerical value of the duration in seconds) to the archive files.

taltstidl commented 5 years ago

@spkaluzny I've updated the column descriptions. We've decided against renaming the column names within the archive files, so it's probably best to map these within the gt_get_data function. If there's anything else I can do, please let me know.