ehn-dcc-development / hcert-schema

Electronic Health Certificates Payload Schema
2 stars 4 forks source link

date-time and CBOR #17

Closed martin-lindstrom closed 3 years ago

martin-lindstrom commented 3 years ago

I suspect that many of us will generate source code from the JSON schema. In places where a date and time is represented the is expressed as follows in the JSON schema:

                "dts": {
                    "title": "Date and time sample",
                    "description": "Date and time when the sample for the test was collected",
                    "type": "string",
                    "format": "date-time",
                    "example": "2021-02-20T12:34:56+00:00"
                },

For Java users many will probably want to use FasterXML:s Java Jackson-CBOR extension and in those cases a date-time will be encoded as a CBOR string (tag: 9), but it should really be tag 0 (string representation) or 1 (epoch-based).

You could argue that FasterXML has a bug and that it should understand that a java.time.Instant should be represented using either tag 0 or 1, but I just wanted to raise the concern about possible interoperability issues concerning dates if the parsing side isn't liberal enough to accept any of the tags 0, 1 and 9 where a date-time is expected to be found.

jschlyter commented 3 years ago

A document (in this repo) that defines the JSON to CBOR mappings would be useful. Such a document should also state that one should be strict in emitting and liberal when parsing. If this is needed, I can write a draft.

martin-lindstrom commented 3 years ago

The more I think about it the more I see that coding of date-time will be something that we need to be very explicit about. The JSON schema states type=string and format=date-time. One implementer may think "OK, lets encode this as a string (tag 9)", another one sees date-time and encodes it as a CBOR date time string (tag 0), and a third one that really want to save space goes for the CBOR numeric date-time encoding (tag 1).

There is really a gap between JSON and CBOR here.

So. Should we require a particular encoding in the resulting CBOR?

jschlyter commented 3 years ago

If one implements a direct JSON dictionary (which schema constraints) to CBOR mapping, I believe one would end up with plain strings (tag 9) even for date-time objects. It is or course, as you write, possible to be sneaky and encode the date-time strings defined in the JSON schema as native date-time (tag 1) or date time string (tag 0).

We should add a short document that goes along with the eu_hcert_v1 schema and I suggest we write that signers SHOULD be strict and encode date-time as a strings (tag 9) and that verifiers should be liberal in what they accept and accept encodings. Whould that work @martin-lindstrom?

martin-lindstrom commented 3 years ago

Or maybe just add a clarification in the comment for each dateTime element directly in the schema file?

I think that the best thing to do would be to require that CBOR-producers code date-times as CBOR data-time strings (tag 0), and also have a text that says that CBOR-consumers should also handle the case when a dateTime object is coded as a string (tag 9).

That is my best interop advice.

dirkx commented 3 years ago

On 14 Apr 2021, at 23:33, Martin Lindström @.***> wrote:

The more I think about it the more I see that coding of date-time will be something that we need to be very explicit about. The JSON schema states type=string and format=date-time. One implementer may think "OK, lets encode this as a string (tag 9)", another one sees date-time and encodes it as a CBOR date time string (tag 0), and a third one that really want to save space goes for the CBOR numeric date-time encoding (tag 1).

There is really a gap between JSON and CBOR here.

So. Should we require a particular encoding in the resulting CBOR

Beware that the DE proposal now calls for 1970 seconds.

Which is nice an crisp - but may need rounding to 24x60x60 to ensure privacy.

With kind regards,

Dw

martin-lindstrom commented 3 years ago

DE proposal? The SAP-guys?

Using numeric time encoding would save some space, but my concern is that it will lead to interop-issues. Looking at the implementations in ehn-digital-green-development many of them would fail as it is now.

martin-lindstrom commented 3 years ago

The issued-at and expires of the CWT are encoded using the numeric dateformat but without a tag....

jschlyter commented 3 years ago

I think that the best thing to do would be to require that CBOR-producers code date-times as CBOR data-time strings (tag 0), and also have a text that says that CBOR-consumers should also handle the case when a dateTime object is coded as a string (tag 9).

This will result in different encodings for date-time and date, which may be unfortunate.

martin-lindstrom commented 3 years ago

I think that the best thing to do would be to require that CBOR-producers code date-times as CBOR data-time strings (tag 0), and also have a text that says that CBOR-consumers should also handle the case when a dateTime object is coded as a string (tag 9).

This will result in different encodings for date-time and date, which may be unfortunate.

Yes. But the payload is one thing where people usually will use JSON and then go to CBOR, and CWT is pretty special and no-one will attempt to do anything automatic there. So, let's separate the payload from the CWT in coding discussions.

jschlyter commented 3 years ago

If they use JSON and go to CBOR, date-time will turn into a string, yes? At least that is what happens in my world. If I go from a generic map/dictionary directly to CBOR, date-time will turn into a date-time string.

martin-lindstrom commented 3 years ago

Yes. Since a date-time in JSON is a string, a generic CBORFunc.fromJson(jsonString) will generate a CBOR string (plain or UTF8). And that could be seen as wrong.

jkiddo commented 3 years ago

So - who makes a call on this one. The easiest (for me) is to keep it as a string in the payload, as I don't have to change anything - but that doesn't necessarily make it the right decision.

martin-lindstrom commented 3 years ago

I vote for representing all dateTime elements as integers (seconds since epoch). This gives the advantages that there will be no interop-issues concerning different dateTime to/from string conversions and it also saves space.

jschlyter commented 3 years ago

I vote for representing all dateTime elements as integers (seconds since epoch). This gives the advantages that there will be no interop-issues concerning different dateTime to/from string conversions and it also saves space.

I agree, this is a lot simpler.

Razumain commented 3 years ago

I also totally agree. seconds since epoch is in my view the safest path to interop.

dirkx commented 3 years ago

+1 with the advice to round this to a 24x60x60 or similar granular boundary when needed for privacy.

jkiddo commented 3 years ago

👍

Razumain commented 3 years ago

A question. Is there a similar need for date. That is, when the actual time is not of interest but what date and where it is important that the date is not different depending on the verifiers local timezone?

martin-lindstrom commented 3 years ago

A question. Is there a similar need for date. That is, when the actual time is not of interest but what date and where it is important that the date is not different depending on the verifiers local timezone?

I think that we should stick with the "YYYY-MM-DD" string formats for dates. I have rarely seen numeric representations of dates.

Razumain commented 3 years ago

A question. Is there a similar need for date. That is, when the actual time is not of interest but what date and where it is important that the date is not different depending on the verifiers local timezone?

I think that we should stick with the "YYYY-MM-DD" string formats for dates. I have rarely seen numeric representations of dates.

I strongly agree. I was just making sure. Using time as date can get really ugly.

martin-lindstrom commented 3 years ago

+1 with the advice to round this to a 24x60x60 or similar granular boundary when needed for privacy.

Depends on what we use dates for. As long as they are used to mark when you received a vaccination shot isn't it a concern for the issuing side? A dutch issuer may want to use hour-granularity and a german issuer second-granularity.

dirkx commented 3 years ago

So a "Calendar dates" from RFC 3339 section 5.6 / ISO 8601 as a String. Exactly 8 digits or 8 digits + two dashes long.

martin-lindstrom commented 3 years ago

Yes

Razumain commented 3 years ago

+1 with the advice to round this to a 24x60x60 or similar granular boundary when needed for privacy.

Depends on what we use dates for. As long as they are used to mark when you received a vaccination shot isn't it a concern for the issuing side? A dutch issuer may want to use hour-granularity and a german issuer second-granularity.

No. If it is the date/time you received a shot, Then this is a time.

If you ont the other hand are after the date of birth, then you want the exact date. And that date must not differ depending on the time-zone of the reader

jschlyter commented 3 years ago

So a "Calendar dates" from RFC 3339 section 5.6 / ISO 8601 as a String. Exactly 8 digits or 8 digits + two dashes long.

Date is "YYYY-MM-DD" (always with dashes)

dirkx commented 3 years ago

Ok - so lets lock it down to that - as the spec allows both (and te X509 world commonly uses it without)

martin-lindstrom commented 3 years ago

https://github.com/ehn-digital-green-development/hcert-schema/pull/20

martin-lindstrom commented 3 years ago

OK. Now that PR #20 states that we should seconds since epoch for all elements that represent a timestamp we need to have something written about the CBOR representation. The time could be either represented as a plain CBOR int or as a CBOR numeric date-time (tag 1).

If your implementation generates CBOR from a JSON-representation the elements representing times will be plain integers, and I think we should recommend this. And have a wording in the spec that consumers should be able to handle both types of encodings.

jschlyter commented 3 years ago

Perhaps something like this in the Schemata README:

Implementation Notes

CBOR Encoding

Concise Binary Object Representation (CBOR), specified in RFC7049, defined a number of major data types. The following types are RECOMMENDED to be used by parties creating electronic health certificates payloads:

Parties validating payloads are strongly advised to follow the robustness principle and be liberal in what you accept from others.