DoubangoTelecom / ultimateMRZ-SDK

Machine-readable zone/travel document (MRZ / MRTD) detector and recognizer using deep learning
https://www.doubango.org/webapps/mrz/
Other
177 stars 49 forks source link

Parsing Mrz Issue #40

Closed imranbaloch closed 4 years ago

imranbaloch commented 4 years ago

I have the following MRZ,

P<**********<*****<<*******<<<<<<<<<<<<<<<<<
********<*******************<<<<<<<<<<<<<<<0

According to the parser at https://github.com/DoubangoTelecom/ultimateMRZ-SDK/blob/1ef4b1782414d246153a5a718a10579a65bf1cef/samples/c%2B%2B/mrz_parser.h#L303-L304

This is invalid MRZ because of the last 2 chars. But your cloud application it is working. Here is the Passport below,

image

Is cloud using the same parser?

imranbaloch commented 4 years ago

According to https://en.wikipedia.org/wiki/Machine-readable_passport

Positions Length Characters Meaning
1–9 9 alpha+num+< Passport number
10 1 numeric Check digit over digits 1–9
11–13 3 alpha+< Nationality (ISO 3166-1 alpha-3 code with modifications)
14–19 6 numeric Date of birth (YYMMDD)
20 1 numeric Check digit over digits 14–19
21 1 alpha+< Sex (M, F or < for male, female or unspecified)
22–27 6 numeric Expiration date of passport (YYMMDD)
28 1 numeric Check digit over digits 22–27
29–42 14 alpha+num+< Personal number (may be used by the issuing country as it desires)
43 1 numeric+< Check digit over digits 29–42 (may be < if all characters are <)
44 1 numeric Check digit over digits 1–10, 14–20, and 22–43

The 43rd char can be < or number. Is it correct?

DoubangoTelecom commented 4 years ago

The cloud doesn't check the validity for a field we only check the size. For example, we only check if fields 43 and 44 have size 1. The code on the cloud looks like this:

#define REGEX_ANYCHAR                   "[A-Z0-9<]"
static const std::regex __TD3_line1(
        REGEX_ANYCHAR_GROUPMATCH(9) /*1: Document number*/ \
        REGEX_ANYCHAR_GROUPMATCH(1) /*2: H*/ \
        REGEX_ANYCHAR_GROUPMATCH(3) /*3: nationality*/ \
        REGEX_ANYCHAR_GROUPMATCH(6) /*4: Birth date*/ \
        REGEX_ANYCHAR_GROUPMATCH(1) /*5: H*/ \
        REGEX_ANYCHAR_GROUPMATCH(1) /*6: S*/ \
        REGEX_ANYCHAR_GROUPMATCH(6) /*7: Expiry date*/ \
        REGEX_ANYCHAR_GROUPMATCH(1) /*8: H*/ \
        REGEX_ANYCHAR_GROUPMATCH(14) /*9: Personal Number (Optional)*/ \
        REGEX_ANYCHAR_GROUPMATCH(1) /*10: H*/\
        REGEX_ANYCHAR_GROUPMATCH(1) /*11: FH*/
    );

Yes, the 43rd char can be < according to https://www.icao.int/publications/Documents/9303_p4_cons_en.pdf page 19. The data validation and the parser are not part of the SDK for a good reason.

DoubangoTelecom commented 4 years ago

Check https://github.com/DoubangoTelecom/ultimateMRZ-SDK/commit/f7069c73d824983e86fbef4f3f362d9aa8ab5a5b