ga4gh-beacon / beacon-verifier

Tool to verify that a Beacon implementation follows the specification
2 stars 3 forks source link

`date-time` tests problem #11

Open mbaudis opened 3 years ago

mbaudis commented 3 years ago

The verifier throws an error for standard ISO8601 dates against the date-time format:

ERROR beacon_verifier::utils    >    ERROR: Format { format: "date-time" } - "2001-01-01" is not a "date-time" (/createDateTime)

AFAIK an ISO date fulfils the date-time requirement?

mbaudis commented 3 years ago

Addendum: also errors w/ time expansion:

 ERROR beacon_verifier::utils    >    ERROR: Format { format: "date-time" } - "2020-06-01T00:00:00.000000" is not a "date-time" (/updateDateTime)
MrRobb commented 3 years ago

This is not on my control to change since date-times in RFC3339 Section 5.6 are defined as the full date, a 'T', the time with optional sub-second precision and the time offset.

If it should only be a date, or something more custom with a regex pattern, then the proper changes should be made in the model to reflect that decision. But if you specify the date-time format of json-schemas, the data should be complete to be valid.

mbaudis commented 3 years ago

Partial date-time formats are valid under ECMA script specifications and should verify.

Also, the only other solution AFAIK is to drop the format flag and to use only string with description for its use, i.e. no verification.

More details

While RFC3339 (July 2002) details the specification of a date-time format as the complete representation of an ISO8601 date & time string and recommend this or its parts for "new internet standards", JSON/ECMA ... definitions use an ISO8601 date-time format in a slightly different manner: same ISO string format (obviously), but allowing atomic representations. This is being documented e.g. in

Now, while the forced use of strict RFC 3339 times may make sense for system-generated timestamps (document updates etc.) it does not seem sensible for usually more fuzzy entries such as sample collection times - time there may be needed, but not in common scenarios. But anyway it seems sensible to verify against the common interpretation (i.e. ISO 8601 partial).

jrambla commented 3 years ago

We are following Schemablocks suggestion here: RFC3339. Relaxing a standard usage is actually not using the standard. Converting that in an open string will remove the purpose of the data type. Being conformant to the data type is a matter of formatting the output, seems easy to me.

mbaudis commented 3 years ago

Sorry, @jrambla - SchemaBlocks documentation of time formats does not reference RFC3339 (as some recommendation how to check/implement) but ISO 8601 (as the format). And I wrote the {S}[B] page myself (RFC3339 isn't mentioned anywhere).

AFAIK RFC3339 is mentioned in timestamp docs such as updateDateTime / updated, for timestamp purposes (and to make it explicit that ISO strings are valid, in contrast to system time / time since epoch etc. - many discussions especially with PXF devs about the difference between format & implementation of timestamp expressions).

Summary

And yes, we can go with "date-time has to be fully RFC3339 date-time compliant" in which case data has to be converted during I/O. But IMO this introduces data errors, since data is being changed from low to higher granularity, since most dates and times for measurements, collections, observations will be rather fuzzy!

(comment updated)

jrambla commented 3 years ago

As I understand the issue: The current schemas include the format: "date-time" as part of the property definition. As such, dates should be date-times and they should follow an strict rule for the format. This is a simple formatting issue that has the garnularity consequence that you mention.

Expressing the date in other formats, using a relaxed validation, does not sound right to me, as we could not rely on the "automatic" validation of the type anymore, and we must revert to trust the implementers. Remember that using the verifier would not be mandatory for all Beacon instances in the Univers, hence, the schema MUST be as precise as necessary.

Therefore, if I need to choose, given that the schemas are out, I will choose to stick to the current strict one.

mbaudis commented 3 years ago

But then the format should be dropped for all those fields where it can be anything from date +- time space, e.g. relatedvto collection or observation points, since they frequently will be date only (even year), but in some instances time will be used/needed. Forcing conversion to strict, complete date-time by implementers will introduce errors (e.g. adding wrong precision), i.e. be destructive.

I guess this is why newer standards recognize validation of partial.

And to iterate: "1999" is perfect ISO; "1999-01-01T00:00:00.000Z" checks out RFC but is about a year away from what Prince was meaning.

mbaudis commented 2 years ago

Oh well, JSON docs for current drafts https://json-schema.org/understanding-json-schema/reference/string.html?highlight=datetime#id7 really are ambiguous, now having date, time, and date-time, but no statement about the handling (besides that format definitions are hints, but verifiers may follow them...). Which leaves the interpretation ambiguous.

Which leaves us with the problem of fields with varying granularity...

As @jrambla mentioned the option of patterns: Here is mine for ISO date +/- time:

^\d{4}-\d{2}-\d{2}([Tt]\d{2}:\d{2}:\d{2}(\.\d{1,3})?([Zz]|[+-]\d{2}:\d{2})?)?$ or ^\d{4}-\d\d-\d\d([Tt]\d\d:\d\d:\d\d(\.\d{1,3})?([Zz]|[+-]\d\d:\d\d)?)?$

This does not verify the sanity of the dates or times, just the string format. Happy to get a better one!