Open whshannon opened 4 years ago
Awesome! Let’s review so I make sure to get it right. For the timezone, I found this vocab, but I’m not seeing EDT or anything catching daylight savings in there. Are those necessary? Is this on the right track?
https://ddialliance.org/Specification/DDI-CV/TimeZone_1.0.html
@mbiddle-bcodmo, thoughts? I'm not sure how best to handle daylight savings time.
yeah, that's tricky. I think we do need to capture that if we are going down this road. Especially when the place decides to change the timezone reference. https://en.wikipedia.org/wiki/Time_in_Venezuela
One of the packages I've used for timezones is python's pytz package (http://pytz.sourceforge.net/). Which apparently uses this resource as it's timezone database. http://www.iana.org/time-zones
Not sure if that helps any.
Ok, searched for easy way to pull IANA timezones DB, (including from dbpedia, but the lists were incomplete, so I attempted to scrape the list at wikipedia in na Google sheet that we can reuse to source a controlled vocab. Can you check out this Sheet and let me know your thoughts?
that looks alright. There's got to be an easier way though.
easier might be to select the UTC value? Like the timezone is UTC+5:00 or UTC-8:45.
see: https://www.timeanddate.com/time/zones/
OWL Time Ontology doesn't provide them either, but says this site above is one potential source: https://www.w3.org/TR/owl-time/#h-note12
personally, I like that approach better than trying to use a vocabulary we can't quite pin down.
So I'm clear the following datum in EDT would be described as follows: datum = "01/14/2019 8:25 AM"
units = unitless date_format = "%m/%d/%Y %H:%M %p" date_format_convention = "Python datetime strftime" time_zone = "UTC-4"
Also note, I don't think we need an independent time_format and time_format_convention attribute as those can be captured in the date_format attribute.
Maybe we should change the name to date_time_format?
Oh, good point. I was thinking of cases where date and time are in separate fields.
As for the time zones, I think selecting the UTC offset is fine assuming the PI provides that info. Otherwise, there may be times when the DM will have to look up the offset to see if daylight savings time applies.
Let’s create three data types for these, mapped to the XSD datatype vocabulary
Date (xsd:date) Datetime (xsd:datetime) Time (xsd:date)
Each can have a: format (xsd:token) and format-type (Controlled vocab)
The easiest way to manage timezone inside the data is to set the UTC offset (-5:00). Been thinking about using the names region as a proxy for these offsets. If we did, we’d need to stay on top of when there was ever a change to any region to make sure it’s UTC offset is correct. Then I thought about out a dataset, each row might be different timezone depending on if it crossed into a different region? I was wondering if we should make this data instead of metadata?? Also thought that lat/Lon could help us determine the correct timezone???
Then I thought about out a dataset, each row might be different timezone depending on if it crossed into a different region?
I'm not sure I've ever seen this in a dataset. Usually, scientists pick a single time zone to stick with throughout a cruise.
Also thought that lat/Lon could help us determine the correct timezone???
It can. But, I'd still defer to the PI to provide the time zone info if it's local time.
The easiest way to manage timezone inside the data is to set the UTC offset (-5:00).
I think that makes sense. My initial thinking was that if times are provided as local time, we may want a time_zone field to capture the time zone. But, since we are converting dates/times to ISO format, you're right, I think we can just capture that in the UTC offset.
The only concern I have is one super special edge case, CARIACO, which had it's timezone changed during the time series of the dataset. For example, the data covered 2004-2018 and the local time zone changed from UTC-4 to UTC-5 on some date in the middle of the data (say 2012-01-01). So, in this very special case, one assignment of a UTC offset would be problematic, bringing the conversation back to each datum requiring its appropriate offset.
Luckily the provider gave us UTC time as well, so it was a moot point. But, is this something we need to consider?
Something that might be worth considering. In the ERDDAP date parser "The parser can handle time zones in the format 'Z', "UTC", "GMT", ±XX:XX, ±XXXX, and ±XX formats."
They only deal with the characters "Z", "UTC", and "GMT". Then, they use time offsets for other timezones.
@ashepherd, based on Monday's DM meeting, we decided we don't need to capture the local time zone offset in a structured way within the data. If local time is important, DMs will keep the original local time column in whatever format provided (and document the time zone in the metadata), but we'll also add an ISO_DateTime_UTC column using laminar, with the format we currently use yyyy-mm-ddTHH:MM:SSZ
ERDDAP will then use the ISO date/time column as its time column.
So, I think what we'll need is date_time_format and date_time_format_convention, e.g. date_time_format = "%m/%d/%Y %H:%M %p" date_time_format_convention = "Python datetime strftime"
meeting notes: https://docs.google.com/document/d/1N9fnTPJRWXFlHVD9CeNZuxdXpbqWla6Nh8rMGeNyjSg/edit#heading=h.9gcrhsenjuwa
Decided we'd annotate the ISODateTime Variable (Dataset Parameter Type) with the ISO Format 'yyyy-mm-ddThh:mm:ssZ'
Would it be possible to use this exact string yyyy-MM-dd'T'HH:mm:ssZ
? That's how ERDDAP likes to see the format. Although I can just hardcode the format for ERDDAP, so it doesn't really matter.
See more here: https://github.com/BCODMO/ERDDAP-BCODMO/issues/17
date_format would allow DMs to specify the format of any date parameter in the metadata date_format_convention would specify the convention used in date_format
example: units = unitless date_format = "M/d/yyyy" date_format_convention = "Java DateTimeFormatter"
or
units = unitless date_format = "%m/%d/%Y" date_format_convention = "Python datetime strftime"
Similarly, time_format = the format of the time parameter time_format_convention = the convention used in time_format
Also, add a time_zone field for any time parameter. (make list a controlled vocab)