EarthScope / mseed3-utils

Validator and other utilities for the xSEED data format
Apache License 2.0
8 stars 2 forks source link

time formatting #24

Closed crotwell closed 1 year ago

crotwell commented 3 years ago

Would be useful if time output in mseed2text printed nanoseconds instead of microseconds.

Also, perhaps the output time should be more ISO friendly, like YYYY-DDD instead of YYYY,DDD. See "Ordinal Dates" in ISO8601.

Maybe add Z to be clear date is UTC.

Change:

             start time: 2012,001,00:00:00.000000

to

             start time: 2012-001T00:00:00.000000000Z

Note mseed2json already does some of this:

    "StartTime": "2012-01-01T00:00:00.000000000Z",

but might be good if it used day of year instead of month day since that is what is stored in the header?

chad-earthscope commented 1 year ago

1- This is a SEED-defined time string: start time: 2012,001,00:00:00.000000

2- This is a compliant variation of ISO 8601: 2012-001T00:00:00.000000000Z

3- This is RFC 3339, and ISO 8601, compliant time string: 2012-01-01T00:00:00.000000000Z

ISO 8601 is a broad standard that includes lots of variations on date-time strings. RFC 3339 is a refinement of ISO 8601 (and RFC 2822) that defines a subset, specifically, most used, and the one we're familiar with: https://www.rfc-editor.org/rfc/rfc3339.html#section-5.6.

Personally I find variation 2 a bad option, it needs to be read very careful by eyeballs to understand it's an ordinal. I've recently been steering towards RFC 3339, option 3, as that seems the most sane, in terms of smaller scope and clarity, options. Importantly it is also what is needed for JSON Schema, which does not support all of ISO 8601, which is why it's in the JSON above. The only downside is that these are not the values stored in miniSEED, which uses the ordinal (day of year) values. It's very nice to see the values that are supposed to be in the data directly for diagnostics. This leads to the mix use being reported and (obviously) I haven't reconciled them in my mind yet.

One option would be to add the RFC 3339 format in the places where the SEED format is printed, redundant but hopefully clearer?

start time: 2012,001,00:00:00.000000 (2012-01-01T00:00:00.000000Z)
crotwell commented 1 year ago

For humans I agree 3339 seems the clearest, my old eyeballs don't really like option 2 either. Maybe instead of repeating the whole time string, go with 3 but add just the day of year, so something like:

start time: 2012-01-01T00:00:00.000000000Z (day: 001)

or even just

start time: 2012-01-01T00:00:00.000000000Z (001)

Another option, since this is just a textual representation of the record, is two lines, so:

start time: 2012-01-01T00:00:00.000000000Z
      year: 2021 day-of-year: 1 hour: 0 min: 0 sec: 0 nano: 0

First is for humans, second is for diagnostics? That would also mean that the decimal digits in the seconds could be 3,6,or 9 digits to have fewer zeros?

chad-earthscope commented 1 year ago

Fixed in libmseed commit: https://github.com/EarthScope/libmseed/commit/a7cea36bde0508d65c00f32a488de046e503c94b

The time string generated in msr3_print() is now this format:

             start time: 2022-06-05T20:32:38.123000Z (156)
crotwell commented 1 year ago

How do you feel about printing microsecond vs nanoseconds when extra digits are zero?

With latest commit, it looks like the format prints micros even though precision is only to milliseconds. Looks like it will print nanos if they are not 000.

Maybe it would be good to either always print full nanos (maybe too verbose, but more closely shows actual data) or trim ending zeros to one of milli, micro or nano precision?

chad-earthscope commented 1 year ago

It's a balance of a few things.

The highest resolution in SEED 2.4 is microseconds, whereas the minimum specified resolution is 0.0001 (tenths of milliseconds). So the choice in this software stack was to print microseconds and not bother with milliseconds. I'd like to continue that.

Nanoseconds are going to be an exception for most data most of the time, i.e. they will be zeros, and so not usefully included, plus they are visually noisy.