Closed martin-lindstrom closed 3 years ago
Agreed.
Can someone verify that the example calculations is indeed correct? Iβve tried to calculate the checksum using the algorithms stated at https://en.wikipedia.org/wiki/Luhn_mod_N_algorithm and my checksum character ends up being β2β when calculated on the string β01 LUX/18737512422923"
With reference to the example on https://github.com/ehn-digital-green-development/ehn-dgc-schema/wiki/FAQ#example
I can verify that that 01:SE:EHM/123456789KOOZC6RM7VZ
ends up having the checksum char as #
@martin-lindstrom
Yes, but I don't think that we should include the '#' character in the set. It's not allowed anywhere else than as the delimiter for the checksum character.
Right ... makes sense - but will you crosscheck that the example is correct? I don't think it is
No.
01LUX/18737512422923
-> Z
if we include the #
in the set and E
if you don't use it.
/** The valid characters of an UVCI (excluding delimiter characters). */
private static final char[] UVCI_CHAR_SET = {
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
/** The valid characters of an UVCI including delimiters. */
private final static String UVCI_CHAR_SET_W_DELIMS = new String(UVCI_CHAR_SET) + "#/:";
/**
* Implementation stolen from https://en.wikipedia.org/wiki/Luhn_mod_N_algorithm#Algorithm_in_Java.
*
* @param input
* the input
* @return the control character
*/
private static char generateLuhnModNCheckCharacter(final String input) {
int factor = 2;
int sum = 0;
int n = UVCI_CHAR_SET_W_DELIMS.length();
// Starting from the right and working leftwards is easier since
// the initial "factor" will always be "2".
for (int i = input.length() - 1; i >= 0; i--) {
int codePoint = UVCI_CHAR_SET_W_DELIMS.indexOf(input.charAt(i));
int addend = factor * codePoint;
// Alternate the "factor" that each "codePoint" is multiplied by
factor = (factor == 2) ? 1 : 2;
// Sum the digits of the "addend" as expressed in base "n"
addend = (addend / n) + (addend % n);
sum += addend;
}
// Calculate the number that must be added to the "sum"
// to make it divisible by "n".
int remainder = sum % n;
int checkCodePoint = (n - remainder) % n;
return UVCI_CHAR_SET_W_DELIMS.charAt(checkCodePoint);
}
So 'No' - as in that you agree that the example is wrong?
Yes.
@gabywh π
.... and the format. Does that follow the urn form as in eg. https://github.com/eu-digital-green-certificates/dgc-testdata/blob/main/common/2DCode/raw/CO2.json#L22 - (which seems to lack the checksum separator and therefor also the checksum character)
Does the / need to be used between version country code and opaque part?
We are currently going for this format: 01BEVL7890ABCDEFGHIJ12345678#3 (where 3 is the Luhm checksum for the complete part before the #)
-- Don't know if the 3 above is actually the correct checksum char. I am just stating the above, for discussing about the format.
Hi! I'm struggling to figure out what is right in this. I have this code, sometimes it matches https://dotnetfiddle.net/xDhpaY
Is there any update on correcting the example and confirming the exact charset? My calculations (using node) agree with Martin's above.
@gabywh @dirkx Update the wiki and remove the #
from the character set and provide correct examples?
Perhaps also elaborate on the urn:uvci
prefix.
πis the removal of '#'
from the character set considered as part of 1.0.0 or is that part of the next version?
is the removal of
'#'
from the character set considered as part of 1.0.0 or is that part of the next version?
won't ever be part of 1.0.0 - could be patch 1.0.1 or probably in next minor 1.1.0. I'm going to try to look at this today (2021-05-14).
.... and the format. Does that follow the urn form as in eg. https://github.com/eu-digital-green-certificates/dgc-testdata/blob/main/common/2DCode/raw/CO2.json#L22 - (which seems to lack the checksum separator and therefor also the checksum character)
A previous format, I think. Am reviewing all of these as part of #59.
@gabywh @dirkx Update the wiki and remove the
#
from the character set and provide correct examples?Perhaps also elaborate on the
urn:uvci
prefix.
As per Annex 2 in https://ec.europa.eu/health/sites/default/files/ehealth/docs/vaccination-proof_interoperability-guidelines_en.pdf which - as far as I understand it - has:
[2 digit version]:[country code - where possible 2 digit ISO 3166]:[payload/pay/load]#[checksum]
field separators:
:
- version / country/
- used with alphanumeric payload field#
- used at end to separate block from its checksum The prefix URN:UVCI:
shall be present when stored in the JSON schema, but is not required to be displayed for the human-readable / printed version
@gabywh or anyone, question concerning the checksum part - is it required to provide? In the QR code examples not every country provides it now.
The checksum was specifically ment for human entry - e.g. from paper.
And not anticipated in the digital version - as all data is generally going through digitally signed channels - and thus integrity protected.
The documents are not terribly clear though (as these detailed appendixes did not make it in version 1.0 at the time).
Prefix:
URN:UVCI:[the rest...]
Notes
URN:UVCI:
prefix - so the prefix has to be in uppercaseWhat is the value of urn:uvci
except more bytes? If the field is clearly defined it is unnecessary bloat. Also, it is not clearly defined, so that some include it, other don't. I still don't read from the specs that this prefix is expected.
Okay, someone put it in and it is till totally useless. We fight for bytes and then we include an identifier that has an unnecessary prefix?
The appendix details why this prefix needs to be there - and how it helps with resolution in scenario 3 - and how it helps guarantee global uniqueness in an open system - basically capturing the industry best practices that the internet has learned the hard way(rfc 1736, rfc 1737, rfc 8141) . Keeping this field to be a valid URI therefore is valuable.
You could argue of course that this is a short lived system in a closed world. Sometimes that works - but on the internet open-systems tend to win from closed world. Or, worse, give rise to a open world replacement.
Fixed with commit 2af8559
According to the wiki the Luhn-Mod-N charset used during calculation is
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789#/:
. This means that the#
character whose only purpose is to be used as the delimiter for the UVCI and its checksum may end up as the checksum character, e.g.:01:SE:EHM/123456789KOOZC6RM7VZ##
Is this really want we want? I think it is sensible to exclude the
#
character from the charset since it can not appear anywhere in the UVCI.