ehn-dcc-development / eu-dcc-hcert-spec

Electronic Health Certificates Specification
363 stars 40 forks source link

Develop a URI Scheme for the ZLib/Base45 Representation #59

Closed vitorpamplona closed 3 years ago

vitorpamplona commented 3 years ago

I would like to suggest using a URI Scheme to help QR Code readers/verifiers understand what is this QR without having to guess.

Something like:

ZLIB45:<Base45 payload>

or even a

DGC:<Base45 payload>

The idea is to use the URI Schema Type to inform QR readers how to unbundle the rest of the QR. By knowing that this is a ZLIB45, the reader can then unencode, unzip and see the payload.

If accepted, the representation will work nicely with all other QRs (99% of QRs have a URI inside it) and be compatible with most, if not all, QR readers.

dirkx commented 3 years ago

We already have a HC1: prefix for very much this reason (that also 'blocks' automatic browser lookups or similar leaking this data into remote log files).

That is nog good enough ?

asitplus-pteufl commented 3 years ago

@vitorpamplona HC1 would link to a complete algorithm suite that defines the decoding chain. if the chain would change in the future one would use HC2 and define the chain there, so that a validation app would know which validation suite it should use to be able to read/validate the data Does this cover your requirements?

vitorpamplona commented 3 years ago

oh, nice! Apologies, I couldn't find it anywhere.

vitorpamplona commented 3 years ago

Do you have a spec file for the HC1 schema itself? A document that defines exactly all steps to encode and decode?

asitplus-pteufl commented 3 years ago

well for now this is just the 1.05. spec. in this rep (MD, PDF, Word) But I guess when everything is final, there should be a clear link that HC1 refers to this document and other related aspects (validation etc.).

vitorpamplona commented 3 years ago

I strongly suggest making sure that Schema Spec exists and it is registered with IANA.

It doesn't need to be a Schema that represents the whole DGC spec. It could be a simpler spec that just unbundles the payload to a CBOR file and the CBOR file then has the reference to the DGC spec.

It makes the lives of independent verifiers much easier.

jschlyter commented 3 years ago

@vitorpamplona I'm quite sure that IANA will not register HC1 has a URL scheme, it exists only to prohibit scanning. Keep in mind that the payload following the HC1: preamble is not URL encoded, it's base45.

vitorpamplona commented 3 years ago

Yep, it's a simple blob URI, but using Base45. Very similar to other credentials that are using Base32 with CBOR.

Have you reached out to IANA? They were quite open to registering these new Credentials last time we spoke with them.

vitorpamplona commented 3 years ago

BTW, I am sure you have thought about this, but Base45 has / and :, which may lead to some pretty weird and potentially invalid URIs.

chris2286266 commented 3 years ago

Yes, but it will not be rendered as Urls. The reason for the "strange looking" chacacter-set is the definition of "mode 2" od the QR-Codes, where it needs just 5,5 Bit per character ... grafik

dirkx commented 3 years ago

They are meant for QR codes; if you are in URL land - safer and wiser to use a more compact URL Safe Base64 encoding.

vitorpamplona commented 3 years ago

No, the alphanumeric character set is not the issue @chris2286266 .

You are not realizing that both URI and URL have reserved characters that the Base45 is interfering with. If you are looking to render links from the content, you will have to Percent Encode certain characters in the Base45.

For instance, from the Base45 alphabet

Space $ % * + - . / :

These should not be in a URL or URI

Space $ * + / :

So, in theory, you need to use a Base39, not a Base45. PercentEncoding a Base45 makes it a Base39.

Otherwise, you would have crazy URI/URLs like these:

HC1::::::::::::
HC1:///////////
HC1:$:+:+::+/::/+ *:+:/:*::

These would never render as a link.

vitorpamplona commented 3 years ago

No @dirkx. Base64 is the worst encoding for QRs because it forces the use of Binary Mode while not using 2 bits of every byte. Since you are only representing 64 options in a byte (256 options), this leads to 75% wasted space.

dirkx commented 3 years ago

Right. Totally agreed.

Base45 is the 'best' format you can package in the Mode 0010/2 of a QR.

So we want to put Base45 in the QR. And reduce the risk that the QR is seen as a link (that is why we have the HC1: prefix - it tries to break making this a valid link - but enough of a link so that siri/google/etc do not do a google on the payload). As we really do not want this QR to go into log files at search engines, caches and digital assistants.

And that is why, if you where to include this HCERT in a URL; you should be using the URL safe base64 to make it safe as a parameter.

vitorpamplona commented 3 years ago

@dirkx But then the problem is that a Base45 representation can inadvertently create a link.

HC1:ASASDFETYJG434SF234SDF42KH

Is turned into a link in both iOS and Android.

vitorpamplona commented 3 years ago

@dirkx BTW, both Google and Apple use server-based AI to enhance their find-in-picture algorithms. The image of the QR is already going to Google/Apple no matter what you do.

dirkx commented 3 years ago

Which exact versions ? As that very link (and all the others we've tried sofar) appear to just yield a google (and if you monitor all traffic through a proxy) SIRI lookup for "HC1" or an error or no offer to scan.

Which version of iOS and Android are you seeing this on ? And what is emitted on the DNS or HTTP layer to what ?

dirkx commented 3 years ago

As to the AI canning - would you know which versions of the OS are doing this ? We've not yet found this in a proxy-capture of what we hope is all traffic.

vitorpamplona commented 3 years ago

AI canning doesn't happen immediately. The phone stores information and will send it as a bigger payload later when it's time to sync. To the best of my knowledge, all versions for both OSs do some level of it.

On the link, I don't have my test bench at the moment, but I stress-tested this when creating our own QRs and Base45 just allows everything you can possibly think to happen. Some will not link at all, others will link all the time, some weird URIs will crash older iOSs. It's very inconsistent.

If you SMS this message to yourself, Google turns it into a link after 5 mins on my Pixel3:

HC1:TESTER.CO/IDOHAVEAIDS

TESTER.CO/IDOHAVEAIDS is a valid Base45 and the pattern A.B/C will happen more frequently than you think.

vitorpamplona commented 3 years ago

You might have seen it already, but Apple's implementation is out: https://developer.apple.com/videos/play/wwdc2021/10089/

Importing "SHC:" URIs into the Health App from the main Camera App

jschlyter commented 3 years ago

As mentioned in #64, please open up and new issue if you want to contribute with new encoding schemes.

It is most unfortunate that Apple and other vendors has not taken part in the discussion with the EU development team the last couple of months.