Open jamesrco opened 6 years ago
Some of the timestamp issues are taken care of lubridate in R. I think the function is parse_datetime(), it lets you specify multiple timestamp formats and set how many levels of detail can be truncated before throwing an error.
Thanks @jpshanno -- I have played around with lubridate quite a bit but will give it another look! I've found the primary issue is often that information such as the correct time zone/offset is easily lost when different packages or functions are used for interconversion of POSIXlt and POSIXct objects. Unfortunately, I've found the use of multiple functions/packages necessary since they don't all play nice with ISO 8601... something I've found surprising.
Yeah, multiple packages can definitely wreak some havoc, maybe we can find a solution this week.
If I remember right the functionality was a little buried, but I have some lubridate code that uses the multiple formats that I can share with you.
I often find I'm given multiple, timestamped datasets that belong to the same or complementary projects. Future users of these data (including me) often want to compare/evaluate these data side-by-side since they're related, but issues with the timestamps sometimes make this a huge pain. I end up spending a large amount of time manually reformatting timestamps or cajoling them into the "right" format. I've found a lot of the existing tools in R, Python, and MATLAB which purport to assist with these kinds of tasks to be lacking.
The data I receive often have one or more of the following features:
I propose developing a set of tools in R and/or Python to assist in ingesting/formatting, and storing timestamps in a common format. Could make use of some existing functionality in existing R or Python packages; our product could even been a modification/contribution to the existing package code.