COMCIFS / Powder_Dictionary

CIF definitions for powder diffraction
4 stars 4 forks source link

Fix timestamps #28

Closed jamesrhester closed 1 year ago

jamesrhester commented 1 year ago

This pull request fixes all example timestamps, and the accompanying text descriptions, to match exactly RFC3339. Please consider carefully as these could be considered to change the format of the _pd_meas.datetime_* and _pd_proc.info_* datetime items. I believe very few if any software packages would actually parse these items so the change should have minimal effect.

The change to _pd_block.id is not so significant as this was always intended to be an opaque identifier and time is used to make it unique; the particular format of the time is not relevant.

Two example lines still fail checking, but these are clearly mistakes in the checking software, see issue that I've raised here.

vaitkus commented 1 year ago

@jameshester, are you sure there is a mistake in the checking software? Timestamps reported as invalid seem to be missing the mandatory seconds part:

'2015-10-30T22:45-02:00' instead of '2015-10-30T22:45:00-02:00'
'2015-10-30T22:45-02:00' instead of '2015-10-30T22:45:00-02:00'
jamesrhester commented 1 year ago

Alas, you are right, not sure how I missed that. I will update the pull request and I think we can close the related issue.

vaitkus commented 1 year ago

Ok, thank you for the changes.

rowlesmr commented 1 year ago

I think that requiring a fully compliant RFC3339 datetime is too onerous, especially as this field will probably be being added manually by most people.

The definition for DateTime is

     DateTime

; A timestamp. Text formats must use date-time or full-date productions of RFC 3339 ABNF ;

Does this mean: "(date-time or full-date productions) of RFC 3339 ABNF" or "date-time or (full-date productions of RFC 3339 ABNF)"?

If the later, what is a date-time? I can't find any definition of it.

vaitkus commented 1 year ago

I think that requiring a fully compliant RFC3339 datetime is too onerous, especially as this field will probably be being added manually by most people.

While full datetime values may not be important in some applications, in other applications they are crucial (e.g. sorting). Also, if we decouple the DDLm datetime definition from the RFC3339 one, we would then have to explicitly specify what deviations are allowed, e.g. seconds may be omitted, offsets may be omitted, both seconds and offsets may be omitted, etc. This would then also mean that RFC3339 compliant date parsing libraries no longer work with all possible DDLm datetime values. Seems like a bit of a headache.

I understand that manually always adding the "placeholder" ':00' seconds is a bit cumbersome, but is this extra bit of convenience really worth the change in specification?

Does this mean: "(date-time or full-date productions) of RFC 3339 ABNF" or "date-time or (full-date productions of RFC 3339 ABNF)"?

The first option. The terms "date-time" and "full-date" productions are taken from the ABNF grammar provided in section 5.6 of RFC3339.

rowlesmr commented 1 year ago

The terms "date-time" and "full-date" productions are taken from the ABNF grammar provided in section 5.6 of RFC3339.

Thanks, my parsing that statement was a little ambiguous :). The syntax checking is incorrectly flagging 1979-09-01 (in the example above) as an incorrect format, as this is a full-date.

I understand that manually always adding the "placeholder" ':00' seconds is a bit cumbersome, but is this extra bit of convenience really worth the change in specification?

and a timezone.

I can see why you want to maintain strict compatability, but we also want to promote ease of use, and requiring times to be specified to the second seems a little overkill; you're already allowing both a date-time and full-date. Also, I'm coming from PD land, where there isn't much help from instrument manufacturers in automatically outputting data in CIF format. If this is a solved problem in SCXRD world, then maybe we should agitate more.

yyyy-mm-dd yyyy-mm-ddThh:mm yyyy-mm-ddThh:mm:ss yyyy-mm-ddThh:mm:ss.dd yyyy-mm-ddThh:mmQ yyyy-mm-ddThh:mm:ssQ yyyy-mm-ddThh:mm:ss.ddQ

(where Q == Z | [+-]zz:zz)

vaitkus commented 1 year ago

Thanks, my parsing that statement was a little ambiguous :). The syntax checking is incorrectly flagging 1979-09-01 (in the example above) as an incorrect format, as this is a full-date.

Sorry, I'm not sure what "example above" you are referring to. Value '1979-09-01' is treated as a proper DDLm DateTime by the validator and no validation issues are triggered at all by this PR. Could you please clarify?

I can see why you want to maintain strict compatability, but we also want to promote ease of use, and requiring times to be specified to the second seems a little overkill; you're already allowing both a date-time and full-date.

If I recall correctly, the full-date construct was included to provide backwards compatibility with existing older CIF files, but the overall goal was to more or less standardise and homogenise the way dates and timestamps are recorded, preferably benefiting from an already established standard. I am all pro usability, however, I am unsure if redefining an existing content type is the right approach as some users and dictionary developers may already depend on the current stricter definition and the corresponding program behaviour that comes with it.

Also, I'm coming from PD land, where there isn't much help from instrument manufacturers in automatically outputting data in CIF format. If this is a solved problem in SCXRD world, then maybe we should agitate more.

I must admit that the lack of support for CIF which you described is quite surprising and saddening. But I would guess that there are at least some effective converter programs capable of producing CIFs from the native output of the instrument(s)?

Anyway, any final decision regarding the modification of the DateTime content type lies with @jamesrhester and the COMCIFS Core Dictionary Maintenance Group.

rowlesmr commented 1 year ago

Thanks, my parsing that statement was a little ambiguous :). The syntax checking is incorrectly flagging 1979-09-01 (in the example above) as an incorrect format, as this is a full-date.

Sorry, I'm not sure what "example above" you are referring to. Value '1979-09-01' is treated as a proper DDLm DateTime by the validator and no validation issues are triggered at all by this PR. Could you please clarify?

Ah. I now see that James changed the first example (the one that I saw), and didn't change the second one.

.

I agree with you. Imposing an already accepted standard for something as complex as a date-time is good, and ISO8601 has too much variation. I can't really validate my response past "it feels a little icky" with regards mandating timzones (as well as seconds).

But I would guess that there are at least some effective converter programs capable of producing CIFs from the native output of the instrument(s)?

Nope. Bruker EVA doesn't export to CIF. PowDLL doesn't convert to CIF, although that may be a good place to start. I don't know about highscore/panalytical data viewer. TOPAS doesn't read CIF data, but I've written some macros to export pdCIF files from a refinement. I'm pretty sure GSAS both reads and writes CIFs.

jamesrhester commented 1 year ago
  1. The very latest version of GSAS-II (as of this morning) only writes _pd_proc_info_datetime in a form not including seconds. This could be trivially patched. It does not explicitly read _pd_meas_datetime_initiated but would probably pass it on together with anything else if it read in a CIF file. Certainly some facility instruments are likely to generate the non-seconds form.

  2. Several RFC3339 parsers are available, see here

  3. The original text of the definition did optionally allow seconds and somewhat underspecified timezone so CIF readers will have been written with seconds in mind (but not the letter 'Z').

For these reasons I would advocate accepting the pull request as the impact is relatively minor. We need a strategy in general to communicate such minor updates to the community, I have long dreamt about a CIF newsletter but never had the time to get it to happen.

rowlesmr commented 1 year ago

I don't know about highscore/panalytical data viewer.

The data viewer (and therefore, probably HighScore) doesn't save to CIF, but the XML file in which the data are saved does have a fully qualified RFC3339 datetime for both starting and finishing data collection.

Also, parser can be lenient on reading, as long as they a strict on writing.