eMRTD EF.COM gives corrupted stream - high tag number < 31 found

zienit commented 2 years ago

Hi,

I found a problem with the ASN1InputStream. It fails to read an EF.COM data group from an eMRTD (defined in https://www.icao.int/publications/Documents/9303_p10_cons_en.pdf, section 4.6.1).

This structure uses a tag ‘5F01’. This is a high tag number, however with a tag number 1, which obviously is < 31. There has been a commit to ASN1InputStream on 16th of June 2021 that throws the exception stated above.

I'm no expert on ASN.1 specs, but it appears that ICAO didn't find anything in the specs that prevented them from requiring the tag '5F01' on billions of passports. I think this Exception should not be thrown.

peterdettman commented 2 years ago

Probably there is some confusion here between the identifier octets encoding and the tag number itself.

Firstly, AFAIK it is correct that in BER low tag numbers cannot use the high tag number format. See X.690 8.1.2, in particular:

8.1.2.2 For tags with a number ranging from zero to 30 (inclusive), the identifier octets shall comprise a single octet [...].

However that's probably not what is happening. I don't know anything about this specification (or how BER-TLV stuff is generally presented), but a straightforward reading would suggest that '5F01' is actually the tag number (24321 in decimal) and not the identifier octets themselves. e.g. if the ASN.1 field has type: [24321] IMPLICIT PrintableString

then an example encoding would be: 9F81BE01 04 31353939

which is shown split into the identifier octets, length octet(s), and contents octets. 9F marks it as a CONTEXT class high tag number. The actual tag number (0x5F01) encoding then follows: 81BE01 (encoded per X.690 8.1.2.4).

I guess another possibility is that you are trying to read a Simple-TLV encoding as if it were BER-TLV or something like that. There is some possibly useful discussion in this stackoverflow answer: https://stackoverflow.com/a/18932655/264294 . In particular there is an example of an invalid BER-TLV encoding (5F0F 05 48656C6C6F) similar to the one involved here.

zienit commented 2 years ago

Thanks for your quick response!

I think the ICAO spec made a mistake in their choice of high tags. They describe the EF.COM structure to use the following tags:

'60' (= app. spec & constructed) '5F01' (= app. spec & high tag number) '5F36' (idem) '5C' (= app. spec)

This is an actual EF.COM read from a passport:

60 17 5f 01 04 30 31 30 37 5f 36 06 30 34 30 30 30 30 5c 05 61 75 6f 63 6e

Apart from the illegal tag 5F01 this looks BER-TLVish to me.

peterdettman commented 2 years ago

The more I read, the stranger it looks, yes. Bearing in mind I don't have access to the complete ISO standards, it appears ISO/IEC 7816 references ISO/IEC 8825 to define BER (and this latter spec. looks like it simply copies X.690; there could be minor differences, but at least I have found it says the same thing about tag number encoding as X.690).

However, based on this: https://cardwerk.com/iso7816-4-annex-d-use-of-basic-encoding-rules-asn-1/ , the description in Annex D.2 gives a different description for tag numbers where it talks about (paraphrasing) "this encoding means that tag" instead of the X.690 text "if the tag is this, it shall be encoded thus", and in particular that D.2 text would not restrict using a "high tag" of 1.

That is not the only difference from actual BER though. Note in the D.3 section where it says that the "indefinite length" form is not to be used in BER-TLV, and worse the D.1 section where it says "Before, between or after BER-TLV data objects, ’00’ or ‘FF’ bytes without any meaning may occur".

Taken together my impression is you are going to need a specifically BER-TLV parser to handle this.

zienit commented 2 years ago

Yes, it seems ISO 7816 BER-TLV does a few things different (I'll BER this in mind :-). This is quite confusing because ICAO also uses proper ASN.1 definitions in other places of the specs (e.g. the stuff supported in package org.bouncycastle.asn1.icao.*).

Thanks for your help. I will follow your advice and find a BER-TLV parser to read this object.

zienit commented 2 years ago

Hi,

So reading up a bit on X.690, I think 8.1.2.4.2 bullet c only requires the first subsequent tag octet of a high tag number to be > 0, but not > 30. I think where the surrounding text refers to 'tag numbers greater than 30', it conceptually also includes the 5 bits (11111) from the first octet into the tag number.

So, I still need to use a different parser for the ISO 7816 BER stuff (for the superfluous 00's and FF's), but I think the Exception 'high tag number < 31' is based on an (imho) incorrect interpretation of X.690.

(excerpt from https://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf) 8.1.2.4.2 The subsequent octets shall encode the number of the tag as follows: a) bit 8 of each octet shall be set to one unless it is the last octet of the identifier octets; ISO/IEC 8825-1:2003 (E) ITU-T Rec. X.690 (07/2002) 5 b) bits 7 to 1 of the first subsequent octet, followed by bits 7 to 1 of the second subsequent octet, followed in turn by bits 7 to 1 of each further octet, up to and including the last subsequent octet in the identifier octets shall be the encoding of an unsigned binary integer equal to the tag number, with bit 7 of the first subsequent octet as the most significant bit; c) bits 7 to 1 of the first subsequent octet shall not all be zero.

peterdettman commented 2 years ago

So reading up a bit on X.690, I think 8.1.2.4.2 bullet c only requires the first subsequent tag octet of a high tag number to be > 0, but not > 30. I think where the surrounding text refers to 'tag numbers greater than 30', it conceptually also includes the 5 bits (11111) from the first octet into the tag number.

No, the 1F is definitely just a marker. The > 0 for the first subsequent octet (8.1.2.4.2c) is to ensure a minimal length encoding for the high tag number. 8.1.2.4.2b is clear that the tag number is made up of only (the lower bits of) the subsequent octets. The minimal length aspect also reinforces that high tag number shouldn't be used when the tag would fit into the first identifier octet (i.e < 31).

Our parser returns these tag numbers to code that matches them with tag numbers declared in proper ASN.1 types. It's very clear-cut. Of course it's quite possible that whoever chose the tag '5F01' also thought that the tag number "conceptually also includes the 5 bits", but it does not (in X.690, which we are following).

IMO it is the text of Annex D.2 in ISO7816-4 (which I mentioned in my previous comment) that distorted the original X.690 rule, and I am unsure if that was an intentional weakening or whether the restriction got lost in translation somehow.

zienit commented 2 years ago

I agree 1F is just a marker and not part of the tag. What I intended to say is that I think that 'tag numbers greater than 30' is just informal language, because it is necessary to recruit extra octets if a tag greater than 30 is required (which imho does not imply the inverse).

The > 0 for the first subsequent octet (8.1.2.4.2c) is to ensure a minimal length encoding for the high tag number.

I'm not sure if I agree on this one or understand you correctly. It is only the lower 7 bits of the first subsequent octet that must be > 0. The rule does not influence the value of bit 8 which acts as the marker for the last subsequent octet.

peterdettman commented 2 years ago

8.1.2.2 For tags with a number ranging from zero to 30 (inclusive), the identifier octets shall comprise a single octet [...].

"shall" means MUST (not trying to shout at you, but it has to be read as though the spec is shouting at you). It's not optional.

I'm not sure if I agree on this one or understand you correctly. It is only the lower 7 bits of the first subsequent octet that must be > 0. The rule does not influence the value of bit 8 which acts as the marker for the last subsequent octet.

Yes I wrote lazily, so I will restate it: bits 7 to 1 of the first subsequent octet shall not all be zero. is there to prevent arbitrarily long representations of a given tag i.e. informally: you have to use the minimal length encoding for high tags, and indeed per the other rules, for all tags (because for tags <= 30 the identifier octets shall comprise a single octet).

zienit commented 2 years ago

Thanks for the clarification, fwiw, based on your arguments I agree with your conclusion. I appreciate 8.1.2.4.2 c is there to prevent the construction of tags with "leading zero's".

bcgit / bc-java

eMRTD EF.COM gives corrupted stream - high tag number < 31 found #1081