ehn-dcc-development / hcert-schema

Electronic Health Certificates Payload Schema
2 stars 4 forks source link

Name needs "giveName" and "familyName" #1

Closed chris2286266 closed 3 years ago

chris2286266 commented 3 years ago

Hi, from our experience and also a lot of opinions from colleagues the property "name" should be split into "givenName" and "familyName". All known (travel-) documents (passports, id-cards ...) work this way. The information in the cerificate is also needed to be checked against a travel document.

If a reference to e.g. HL7 is needed, 2.24.2.12 HumanName can be used, viz. https://www.hl7.org/fhir/datatypes.html#humanname

e.g.

       "properties": {
            "gn": {
                "title": "Given name",
                "description": "The given name/s of the person addressed in the certificate",
                "type": "string",
                "example": "Tolvan"
            },
            "fn": {
                "title": "Family name",
                "description": "The family name/s of the person addressed in the certificate",
                "type": "string",
                "example": "Tolvansson"
            },

Thanks

jschlyter commented 3 years ago

Reusing some existing schema seems like a good idea, especially if they easy to match with IDs. One could challenge whether the name is required at all if a unique identifier, e.g., a password number, is provided. The name itself is most likely not unique so another identifier should be included anyway.

Note: One should keep in mind that these properties may not be generic - but that may be another issue.

dirkx commented 3 years ago

Note that this certificate does not need to be a identity document in its own right. During travel, even in Schengen, citizens are expected to carry their real IDs.

For what it is worth - the Netherlands is up till now taking the approach that a full name (or passport number) is NOT desirable at all. Nor needed. Nor always an option - as most National health systems tend to use different internal identifiers (that generally have much longer lifetimes than that of a passport or identity card - and often are 'born', quite literally really, well prior to such being assigned).

That said - there needs to be sufficient information to bind the data that is signed with a person (who has an identity card or passport at the border).

We are currently accomplishing this by including, in the signed payload, just enough information to bind strongly - but not so much that it becomes PII data of value itself.

We do this by taking mix of the first letters of the first name, family name, day and month of birth (but not the year or sex). From these 4 values we select a mix, depending on the prevalence of that permutation in society, that strikes a balance between how hard it is to find someone with an identical mix versus how much of your privacy you loose.

The gory details & statistics can be found at https://github.com/minvws/nl-covid19-coronacheck-app-coordination/tree/main/architecture/identity and are server side (So they can be tuned).

The reason for doing this is to prevent the HCERT from, in effect, becoming a human cookie with enough data to base surveillance upon.

jschlyter commented 3 years ago

If you are also required to show an ID to prove the binding, isn't that the cookie already?

dirkx commented 3 years ago

IMHO - no - and the crucial difference is the QR on the HCERT. It cannot be read by a human (or its signature validated) and therefore has to be processed by a device.

While showing a passport ID card & checking it, with human eyeballs, is not. Nor is comparing tat to the data visible on the screen of the scanner. Keep in mind that for a Schengen trip there is no capture of the ID card at any stage. The airline may just look at it for the execution of the travel-contract.

So the data processed and potentially retained itself is not suitable as a human cookie.

It would be if the ID card itself were scanned (or in generally if any card is scanned and processed). But that is something quite different & generally under a lot of governance.

But if one did that -and- kept the medical data - you'd be violating the EC directive directly.

jschlyter commented 3 years ago

Even though I believe the NL approach is nice and clever, I'm not sure how it could be used with hcert v1 unless we define these partial name and birthdate attributes from the start (but that is of course possible).

dirkx commented 3 years ago

Actually - I think (not sure) that it is perfectly possible to state that each MS shall populate these fields with sufficient information to provide a good binding (within their country). Or perhaps 'this field' for the full name or any part thereof and the same for birthday information.

And then leave it to each country to figure out what they put there. But make it mandatory to put the month and (julian day) the same as used on the ID types recognised for schengen travel.

In NL we simply 'show what is there' and leave it to the human to align this.

And since this is fully server side - clients in the field are very unaware of any choises.

So it can look like { "name": "J S", "dob": "25 MAY" } for one country and { "name": "Q*", "dob": "FEB" } for another (in NL a first-name with a Q is very unique; esp. combined with the 12 months).

This does assume some sort of nationality reveal perhaps (right now - in NL we are using this - but just for NL).

jschlyter commented 3 years ago

If we go down this path, I say we should add dedicated properties for partial name and partial dob (as the date format is fixed as YYYY-MM-DD.

jschlyter commented 3 years ago

@chris2286266 can you file a PR for your initial issue so can we handle partial matching separately?

jschlyter commented 3 years ago

partial matching raised as #3

Razumain commented 3 years ago

I think the partial ID approach has several problems. The problem here is that we have not established the requirements that is used to select a suitable solution. As such is it true that:

1) Matching of an ID card with data from the hcert may be done by a human person ? 2) We should prevent people from being able to find someone who are willing to lend them (possibly for money) a hcert that match their identity?

These are the two most obvious problems. It will be really hard to establish a standard that would allow humans to be able to do this comparison with any accuracy, especially if there are national variations.

If I want to go to a concert and I convince people that I already had covid, I may be able to find a "helper" via my social network, or via a black market to lend me a matching hcert.

Given the birthday paradox, the unicity is actually quite low in your scheme.

In my view. The security here lies in the fact that the holder willingly choose to show the hcert only to those where the user is willing to submit his/her identity. This is done on a daily basis when traveling and I can't possibly see why a identity in the hcert makes that any worse.

If the ID matching is done by a machine, since this is too hard for humans, then the ID is in any case machine read and tracking can occur.


Edit note: One one hand people think it is important to do revocation of hcert because misuse must be prevented at all cost, and on the other hand here is a proposal that makes it fairly easy, and likely, that hcert can be lended to someone else than it was issued to.

These views do not add upp. I think we need to be somewhere in between.

dirkx commented 3 years ago

Our assumption in NL is that it is done by a human -- and the birthday paradox - that is exactly the reason why we made those tables with the prevalence & calculated/iteratively ensured that for each permutation the right balance between privacy & security was hit (except for 2.5%; they are slightly more subject to unblinding).

Razumain commented 3 years ago

OK, But have you calculated how many families in NL where actually 2 people will be able to share the same certificate. I bet there are quite some of those. Randomness strikes both ways. What's unlikely for one case is very likely in a group of many cases.

dirkx commented 3 years ago

Yes - that is what we did. Not sure how common this is - but we have a cultural anthropology/language institute that does research into ‘name’ use in The Netherlands and how this changes over time & generation. The 2014-2017 set we have used now has been used ‘as is’ — we’ve not done things like issue that a certain cohort (e.g. older people, younger people, a certain sex or people from a certain social strata) are more likely to cheat. But we have assumed that the social circle in which you try to find someone to cheat with is dominated by your own family (as those are hardest; your classmates are more ‘distant’ name wise). So we may need to tweak this (it is all done on the backend) if we find that fraud is not an equal opportunity thing.

Razumain commented 3 years ago

I don't doubt your research Dirkx. I just simply can't get myself to agree on the priority.

1) I don't believe that humans, in the first place will do a very good job of comparing J S 25 May with an ID. I think Identity cards will be scanned by readers destroying any attempt to prove that the ID cannot be registered by a device. It will be totally hopeless to reach international agreement on this in this short time frame, so a person checking this at an airport manually will have to expect different format and data from different countries. They certainly need a lot of education and training to understand these concepts. 2) There are so many instruments of travel that has my ID and name attached to it, like my boarding card, by parking reservation, my hotel reservation and my credit card slip from the bar in the departure hall. Why are all these cases OK but a hcert is not, where I only show it to those I chose and trust? 3) Listening to the airline business people, it is clear that most of this type of information should be provided by the traveler from home before even coming to the airport. The ability to submit the hcert electronically will therefore likely be an important requirement very soon for this to scale. In such case my identity will be associated with the hcert regardless, but it will be very hard for the airline getting this information uploaded to determine if it truly belongs to the traveler. 4) I'm not a criminal expert, but I would fear that there could be a black market of hcerts where you can buy a genuine cert matching your name and birth date, or that people would try to advertise their hcert for money on privacy protected forums. I think that is very likely to happen. 5) Why bother with all this security, signing and validation of hcert issuers, when we don't think we need security?

dirkx commented 3 years ago

Understood - so all I can add is that we've done this analysis in depth in the Netherlands; both a civil, privacy, criminal, civil-rights/human right and state actor risk. And in that context this is a workable compromise that provides sufficient security (within the purpose / objective) and still provides enough privacy.

Specific 1) we found this fine in large scale field testing, 2) because when you are eyeballing a pass (with human eyeballs) you are not processing in the sense of the GDPR; a Qr code always has to be processed by a processor with a controller relation as humans cannot decode it in their head; if the other data (e.g. Boarding card) is processed this is done in the contect of the performance of the contract and this is determined to be propportional/etc. 3) we do not believe so based on our testing/pilots in this setting - the binding is sufficient for that puprose. 4) unfortunately we have ample experience with this & sofar think that we have sufficient controls and measures to keep this suffiiently under the levels needed. 5) because human/civil rights and the GDPR require us to make a tradeoff between security and privacy that is proportional to the goal and risks.

But it may be that other countries are more criminal/have enforcement or other issues - and that they have a demonstrable need to reduce the privacy of their citizens.