BCODMO / Ocean-Data-Ontology

Application-level ontology for describing oceanographic datasets
Other
6 stars 4 forks source link

Add date_format and time_format fields #13

Open whshannon opened 4 years ago

whshannon commented 4 years ago

See more here: https://github.com/BCODMO/ERDDAP-BCODMO/issues/17

date_format would allow DMs to specify the format of any date parameter in the metadata date_format_convention would specify the convention used in date_format

example: units = unitless date_format = "M/d/yyyy" date_format_convention = "Java DateTimeFormatter"

or

units = unitless date_format = "%m/%d/%Y" date_format_convention = "Python datetime strftime"

Similarly, time_format = the format of the time parameter time_format_convention = the convention used in time_format

Also, add a time_zone field for any time parameter. (make list a controlled vocab)

ashepherd commented 4 years ago

Awesome! Let’s review so I make sure to get it right. For the timezone, I found this vocab, but I’m not seeing EDT or anything catching daylight savings in there. Are those necessary? Is this on the right track?

https://ddialliance.org/Specification/DDI-CV/TimeZone_1.0.html

whshannon commented 4 years ago

@mbiddle-bcodmo, thoughts? I'm not sure how best to handle daylight savings time.

MathewBiddle commented 4 years ago

yeah, that's tricky. I think we do need to capture that if we are going down this road. Especially when the place decides to change the timezone reference. https://en.wikipedia.org/wiki/Time_in_Venezuela

One of the packages I've used for timezones is python's pytz package (http://pytz.sourceforge.net/). Which apparently uses this resource as it's timezone database. http://www.iana.org/time-zones

Not sure if that helps any.

ashepherd commented 4 years ago

Ok, searched for easy way to pull IANA timezones DB, (including from dbpedia, but the lists were incomplete, so I attempted to scrape the list at wikipedia in na Google sheet that we can reuse to source a controlled vocab. Can you check out this Sheet and let me know your thoughts?

https://docs.google.com/spreadsheets/d/1nMw4TsiFSiIYinOInJBBc-5NUJ4YtXZRT17C4hPF2So/edit#gid=362858527

MathewBiddle commented 4 years ago

that looks alright. There's got to be an easier way though.

ashepherd commented 4 years ago

easier might be to select the UTC value? Like the timezone is UTC+5:00 or UTC-8:45.

see: https://www.timeanddate.com/time/zones/

OWL Time Ontology doesn't provide them either, but says this site above is one potential source: https://www.w3.org/TR/owl-time/#h-note12

MathewBiddle commented 4 years ago

personally, I like that approach better than trying to use a vocabulary we can't quite pin down.

So I'm clear the following datum in EDT would be described as follows: datum = "01/14/2019 8:25 AM"

units = unitless date_format = "%m/%d/%Y %H:%M %p" date_format_convention = "Python datetime strftime" time_zone = "UTC-4"

MathewBiddle commented 4 years ago

Also note, I don't think we need an independent time_format and time_format_convention attribute as those can be captured in the date_format attribute.

Maybe we should change the name to date_time_format?

whshannon commented 4 years ago

Oh, good point. I was thinking of cases where date and time are in separate fields.

As for the time zones, I think selecting the UTC offset is fine assuming the PI provides that info. Otherwise, there may be times when the DM will have to look up the offset to see if daylight savings time applies.

ashepherd commented 4 years ago

Let’s create three data types for these, mapped to the XSD datatype vocabulary

Date (xsd:date) Datetime (xsd:datetime) Time (xsd:date)

Each can have a: format (xsd:token) and format-type (Controlled vocab)

The easiest way to manage timezone inside the data is to set the UTC offset (-5:00). Been thinking about using the names region as a proxy for these offsets. If we did, we’d need to stay on top of when there was ever a change to any region to make sure it’s UTC offset is correct. Then I thought about out a dataset, each row might be different timezone depending on if it crossed into a different region? I was wondering if we should make this data instead of metadata?? Also thought that lat/Lon could help us determine the correct timezone???

whshannon commented 4 years ago

Then I thought about out a dataset, each row might be different timezone depending on if it crossed into a different region?

I'm not sure I've ever seen this in a dataset. Usually, scientists pick a single time zone to stick with throughout a cruise.

Also thought that lat/Lon could help us determine the correct timezone???

It can. But, I'd still defer to the PI to provide the time zone info if it's local time.

The easiest way to manage timezone inside the data is to set the UTC offset (-5:00).

I think that makes sense. My initial thinking was that if times are provided as local time, we may want a time_zone field to capture the time zone. But, since we are converting dates/times to ISO format, you're right, I think we can just capture that in the UTC offset.

MathewBiddle commented 4 years ago

The only concern I have is one super special edge case, CARIACO, which had it's timezone changed during the time series of the dataset. For example, the data covered 2004-2018 and the local time zone changed from UTC-4 to UTC-5 on some date in the middle of the data (say 2012-01-01). So, in this very special case, one assignment of a UTC offset would be problematic, bringing the conversation back to each datum requiring its appropriate offset.

Luckily the provider gave us UTC time as well, so it was a moot point. But, is this something we need to consider?

MathewBiddle commented 4 years ago

Something that might be worth considering. In the ERDDAP date parser "The parser can handle time zones in the format 'Z', "UTC", "GMT", ±XX:XX, ±XXXX, and ±XX formats."

They only deal with the characters "Z", "UTC", and "GMT". Then, they use time offsets for other timezones.

whshannon commented 4 years ago

@ashepherd, based on Monday's DM meeting, we decided we don't need to capture the local time zone offset in a structured way within the data. If local time is important, DMs will keep the original local time column in whatever format provided (and document the time zone in the metadata), but we'll also add an ISO_DateTime_UTC column using laminar, with the format we currently use yyyy-mm-ddTHH:MM:SSZ

ERDDAP will then use the ISO date/time column as its time column.

So, I think what we'll need is date_time_format and date_time_format_convention, e.g. date_time_format = "%m/%d/%Y %H:%M %p" date_time_format_convention = "Python datetime strftime"

meeting notes: https://docs.google.com/document/d/1N9fnTPJRWXFlHVD9CeNZuxdXpbqWla6Nh8rMGeNyjSg/edit#heading=h.9gcrhsenjuwa

ashepherd commented 4 years ago

Decided we'd annotate the ISODateTime Variable (Dataset Parameter Type) with the ISO Format 'yyyy-mm-ddThh:mm:ssZ'

MathewBiddle commented 4 years ago

Would it be possible to use this exact string yyyy-MM-dd'T'HH:mm:ssZ? That's how ERDDAP likes to see the format. Although I can just hardcode the format for ERDDAP, so it doesn't really matter.