ehn-dcc-development / eu-dcc-schema

Schema for the ehn DCC payload
Apache License 2.0
164 stars 59 forks source link

Dates & DateTimes as integers since Epoch (1/1/1970) #57

Closed vitorpamplona closed 3 years ago

vitorpamplona commented 3 years ago

I know this is not standard for JSON on the web, but since the schemas have so many date and datetime fields more efficient DateTime representations can bring significant benefits to the QR code.

Ideally, you want to use a format that marks the scale (days, hours, minutes, seconds, milliseconds or ticks) and uses an integer number of that scale between the given time and an arbitrary epoch (1/1/1970). This lets it send things like date values in a minimal number of bytes.

Even a simple second scale for all dates to the epoch (1/1/1970) represents gains of ~10% on the size of the Covid Test QR.

jschlyter commented 3 years ago

This was considered, but dismissed as most libraries used to convert JSON-like data to CBOR (and back) doesn't support it. I agree it would be better from an encoding point of view.

vitorpamplona commented 3 years ago

Agree, this does require manual conversion after the JSON is assembled by CBOR. Also, there is an open proposal for it.

These types of Unix Timestamps are not rare in JSON files, but it is not an ISO 8601.

It depends on how desperate you are for QR space. :)

gabywh commented 3 years ago

It is also:

  1. not according to spec
  2. problematic cross-platform (Windows, for example, has a different epoch than *nix)
  3. already answered several times - I'll add it to the https://github.com/ehn-digital-green-development/ehn-dgc-schema/wiki/FAQ
gabywh commented 3 years ago

Please see FAQ entry: https://github.com/ehn-digital-green-development/ehn-dgc-schema/wiki/FAQ#why-not-use-seconds-since-epoch-instead-of-the-iso-8601-format-for--date-and-date-time

vitorpamplona commented 3 years ago

@gabywh the second (different epoch times) and third points (size analysis) are incorrect.

For 2. The spec can define the Epoch it wants to use. The operating system has no say in this. This point is irrelevant. For 3. Using integers, seconds, since Epoch offers gains of ~10%, which is more than what Base45 offers as compared to using Base32 (6%). How do I know? I actually implemented it.

I am also not sure why you are saying that "seconds since epoch" is a "string". It's not a string. It's a positive integer. Maybe that's why your analysis said they are the same?

gabywh commented 3 years ago

For 2. The spec can define the Epoch it wants to use.

...is an option... which goes notoriously wrong in practice - and hence once of the motivating factors for ISO8601 to have been generated in the first place

The operating system has no say in this. This point is irrelevant.

Tell that to the operating system ;)

For 3. Using integers, seconds, since Epoch offers gains of ~10%, which is more than what Base45 offers as compared to using Base32 (6%). How do I know? I actually implemented it. ... I am also not sure why you are saying that "seconds since epoch" is a "string". It's not a string. It's a positive integer.

Because integers as binary entities do not exist in JSON. JSON is string.

vitorpamplona commented 3 years ago

...is an option... which goes notoriously wrong in practice - and hence once of the motivating factors for ISO8601 to have been generated in the first place

Sure, but that is not what the FAQ is saying :)

Tell that to the operating system ;)

I don't get it. The implementer would just fix the epoch on the conversion code. It's literally a constant in the code.

Because integers as binary entities do not exist in JSON. JSON is string.

WHAT???? You literally have the number of doses as Integers in YOUR JSON. How can you say that "it doesn't exist"?

CBOR has integers as well.

Here's the full implementation of the HC1 with Date as Integers (HC1DInt): https://github.pathcheck.org/eu.dgc.html

Basically this: "HC1:" + Base45(zlib(cose(cbor(dateAsIntegers(json))))

It yields 7% on the COVID Test Payload. Can be better if you don't use seconds as a scale and go for days since epoch, for instance.

vitorpamplona commented 3 years ago

I think it is important to realize that the 2 dates of the HCERT CWT that encompasses the Schema already use the proposed seconds-since-epoch idea.

Looks like there are no problems between operating systems.

{
  "dataType": "Map",
  "value": [
    [ 6, 1620154654 ],      //  <- This is a Date
    [ 4, 1683172800 ],      //  <- This is a Date
    [
      -260,
      {
        "dataType": "Map",
        "value": [
          [
            1,
            {
              "ver": "1.0.0",
              "nam": {
                "fn": "d'Arsøns - van Halen",
                "gn": "François-Joan",
                "fnt": "DARSONS<VAN<HALEN",
                "gnt": "FRANCOIS<JOAN"
              },
              "dob": "2009-01-28",      //  <- This is a Date
              "t": [
                {
                  "tg": "840539006",
                  "tt": "LP217198-3",
                  "tr": "260415000",
                  "ma": "1232",
                  "sc": "2021-04-13T14:20:00+00:00",      //  <- This is a Date
                  "dr": "2021-04-13T14:40:01+00:00",      //  <- This is a Date
                  "tc": "GGD Fryslân, L-Heliconweg",
                  "co": "NL",
                  "is": "Ministry of VWS",
                  "ci": "urn:uvci:01:NL:GGD/81AAH16AZ"
                }
              ]
            }
          ]
        ]
      }
    ]
  ]
}
jschlyter commented 3 years ago

One should keep in mind that the implementor notes states: "Parties validating payloads are strongly advised to follow the robustness principle and be liberal in what you accept from others.". If someone would encode date-time content as CBOR timestamps, I'd expect them to be parsed correctly.

gabywh commented 3 years ago

One should keep in mind that the implementor notes states: "Parties validating payloads are strongly advised to follow the robustness principle and be liberal in what you accept from others.". If someone would encode date-time content as CBOR timestamps, I'd expect them to be parsed correctly.

Sorry. no.

  1. The formally approved eHealthNetwork specification is very clear on this: format is ISO8601. Please implement accordingly.
  2. This is a mis-application of Postel's law. I will add examples to the FAQ.
vitorpamplona commented 3 years ago

We will definitely accept/verify both formats because:

  1. We already need to do for the cwt anyway
  2. Coding the two ways is super easy
  3. Knowing which one is which is also easy.
  4. CBOR is moving into this direction as well (copying protobuf).
  5. Space saved helps to make some use cases possible.
gabywh commented 3 years ago

Feel free to accept and also create any format you care to choose - that is precisely the intention of the "business rules" stages in e.g. https://github.com/ehn-digital-green-development/ehn-dgc-schema/wiki/FAQ#what-do-the-typical-processing-stages-look-like-for

However, the date / date-time format fields in the vaccination, test and recovery certificates in themselves shall be ISO8601 according to the formally approved and published eHealthNetwork guidelines. If you want to put another format in there and risk a broken implementation, that's entirely up to you, The eHealthNetwork specification is clear on this point. If you make a conscious decision not to follow those guidelines, then please be aware that any lack of interoperability is entirely at your own risk.

vitorpamplona commented 3 years ago

Yep, all good. As a universal verifier, we don't get to choose how people issue their certificates. We simply accept every type of payload we can. It doesn't matter if it follows a spec or not.