Arg0s1080 / mrz

Machine Readable Zone generator and checker for official travel documents sizes 1, 2, 3, MRVA and MRVB (Passports, Visas, national id cards and other travel documents)
GNU General Public License v3.0
328 stars 122 forks source link

mrz.checker: Indian TD3: optional_data_hash fails when optional_data is empty (all '<') and optional_data_hash is '<' #3

Closed Arg0s1080 closed 5 years ago

Arg0s1080 commented 5 years ago

Also, the first example MRZ, is an instance of another false-positive, where optional data hash fails when optional-data is empty (all <) and optional-data-hash is < (instead of 0). I think it may be an error according to specs, but exists in real-world.

Originally posted by @tahajahangir in https://github.com/Arg0s1080/mrz/issues/1#issuecomment-439771574

Arg0s1080 commented 5 years ago

In this case mrz.checker works correctly. However, after seeing several Indian travel documents (Passports, Visas Type A and Visas type B) I have verified that what @tahajahangir explains may happen. For example:

Passports with errors in identifier and optional data hash: image image

However, in Indian visas identifier is printed correctly Visa MRVB image Visa MRVA (although it seems that a final hash is used in this case) image (Images got from Google and pixilated to safeguard privacy)

Although it seems clear that this is a bad implementation of ICAO specifications, I think it would be a good idea to add the possibility of disabling checks for certain fields in future mrz.checker versions. (India is not the only country that does not comply with the specifications)

Meanwhile, the only solution I can think of for particular cases is that someone builds a class that inherits from TD1CodeChecker, TD2CodeChecker or TD3CodeChecker and overwrites some property. Recently someone consulted me by mail as adding a check for an additional hash and easily built a class called TD1DutchCodeChecker, inherited from TD1CodeChecker, which overwrote optional_data and optional_data_hash properties. Important note: The only requirement to make a child class in MRZ is that it must have this format: <DocumentType>*Code* . For example: TD1INDCodeChecker, TD1Type1CodeChecker, MRVAUKCodeGenerator, TD2BRACodeGenerator, TD3CodeCheckerBlahBlah or similar.

This is the class built for check Dutch Id Cards:

from ..base.countries_ops import *
from ..base.functions import hash_is_ok
from .td1 import TD1CodeChecker

import mrz.base.string_checkers as check

__all__ = ["TD1DutchCodeChecker", "code_list", "countries_list", "countries_code_list", "code_country_list",
           "is_country", "is_code", "get_code", "get_country", "find_country"]

class TD1DutchCodeChecker(TD1CodeChecker):
    """
    Check the string code of the machine readable zone for dutch TD1

    __bool__() returns True if all fields are validated, False otherwise

    Params:
        mrz_string        (str):  MRZ string of td1s. Must be 90 uppercase characters long
        check_expiry     (bool):  If it's set to True, it is verified and reported as warning that the
                                  document is not expired and that expiry_date is not greater than 10 years
        compute_warnings (bool):  If it's set True, warnings compute as False

    """
    def __init__(self, mrz_code: str, check_expiry=False, compute_warnings=False):
        TD1CodeChecker.__init__(self, mrz_code, check_expiry, compute_warnings)

    @property
    def optional_data(self) -> bool:
        """Return True if the format of the optional data field is validated, False otherwise."""
        s = self._optional_data
        return True if check.is_empty(s) else self._report("id number format", check.is_printable(s))

    @property
    def optional_data_hash(self):
        self._optional_data_hash = self.mrz_code.splitlines()[0][29]
        self._optional_data = self.mrz_code.splitlines()[0][15: 29]
        return self._report("id number hash", hash_is_ok(self._optional_data, self._optional_data_hash))

    def _all_hashes(self) -> bool:
        return (self.final_hash &
                self.document_number_hash &
                self.birth_date_hash &
                self.expiry_date_hash &
                self.optional_data_hash)

Usage:

from mrz.checker.td1_dutch import TD1DutchCodeChecker

mrz_code = ("I<NLDARRE84NB20123456789<<<<<7\n"  # '7' is an aditional hash not included in TD1 ICAO specifications
            "9901236M3012235NLD<<<<<<<<<<<2\n"
            "SMITH<<JOHN<JOEY<<<<<<<<<<<<<<")

checker = TD1DutchCodeChecker(mrz_code)
print("Check: %s" % checker)

Output: Check: True

Arg0s1080 commented 5 years ago

@tahajahangir this is a quick and simple solution for Indian passports:

import mrz.base.string_checkers as check
from mrz.checker.td3 import *
from mrz.checker._honorifics import titles
from mrz.base.functions import hash_is_ok

class TD3INDCodeChecker(TD3CodeChecker):
    @property
    def optional_data_hash(self) -> bool:
        """Return True if hash of optional data is True, False otherwise."""
        if check.is_empty(self._optional_data) and self._optional_data_hash == "<":
            ok = True
        else:
            ok = hash_is_ok(self._optional_data, self._optional_data_hash)
        return self._report("optional data hash", ok)

    @property
    def identifier(self) -> bool:
        """Return True is the identifier is validated overcoming the checks, False otherwise."""
        full_id = self._identifier.rstrip("<")
        padding = self._identifier[len(full_id):]
        id2iter = full_id.split("<<")
        id_len = len(id2iter)
        primary = secondary = None
        if not check.is_printable(self._identifier):
            ok = False
        elif check.is_empty(self._identifier):
            self._report("empty identifier", kind=2)
            ok = False
        elif check.uses_nums(full_id):
            self._report("identifier with numbers", kind=2)
            ok = False
        else:
            if full_id.startswith("<<"):
                id2iter = id2iter[1:]
                id_len = len(id2iter)
                if id_len == len([i for i in id2iter if i]):
                    if id_len == 2:
                        primary, secondary = id2iter
                        ok = True
                    elif id_len == 1:
                        primary, secondary = id2iter[0], ""
                        self._report("only one identifier", kind=1)
                        ok = not self._compute_warnings
                    else:
                        self._report("more than two identifiers", kind=2)
                        ok = False
                else:  # too many '<' in id
                    self._report("invalid identifier format", kind=2)
                    ok = False
            else:  # if the identifier MUST starts with "<<" it is reported as error and ok is set to False
                   # IMPORTANT: I don't know real requirements
                self._report("identifier doesn't begin by '<<", kind=2)
                ok = False
        # print("Debug. id2iter ............:", id2iter)
        # print("Debug. (secondary, primary):", (secondary, primary))
        # print("Debug. padding ............:", padding)
        if ok:
            if primary.startswith("<") or secondary and secondary.startswith("<"):
                self._report("some identifier begin by '<'", kind=2)
                ok = False
            if not padding:
                self._report("possible truncating", kind=1)
                ok = False if self._compute_warnings else ok
            for i in range(id_len):
                for itm in id2iter[i].split("<"):
                    if itm:
                        for tit in titles:
                            if tit == itm:
                                if i:  # secondary id
                                    self._report("Possible unauthorized prefix or suffix in identifier", kind=1)
                                else:  # primary id
                                    self._report("Possible not recommended prefix or suffix in identifier", kind=1)
                                ok = False if self._compute_warnings else ok
        return self._report("identifier", ok)

Usage:

mrz_code = ("P<IND<<AHMADI<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<\n"
            "K2578285<7IND5601240F2202288<<<<<<<<<<<<<<<4")
passport_check = PassportINDCodeChecker(mrz_code)

print("CHECK...:%s" % passport_check)
print("WARNINGS:%s" % passport_check.report_warnings)

Output:

CHECK...:True
WARNINGS:['only one identifier']

Best regards

ShadabShariff commented 5 years ago

Hi, I was looking into something similar and faced the same issue for indian passport MRZ, could you let me know where is the PassportINDCodeChecker() is defined in the above usage snippet .

Arg0s1080 commented 5 years ago

Hi @ShadabShariff

That snippet is just an example (it's done very quick, so I'm sure it can be improved)

Just copy&paste the text into a file (eg td3_india.py), save and use it.

Optionally it can be installed and used with mrz. Just copy td3_indian.py in mrz/checker folder and execute setup.py

For example, in Linux it could be done like this:

git clone https://github.com/Arg0s1080/mrz.git
cp td3_indian.py ~/mrz/mrz/checker/
cd mrz
sudo python3 setup.py

and then:

from mrz.checker.td3_india import TD3INDCodeChecker

mrz_code = ("P<IND<<AHMADI<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<\n"
            "K2578285<7IND5601240F2202288<<<<<<<<<<<<<<<4")
passport_check = TD3INDCodeChecker(mrz_code)

print("CHECK...:%s" % passport_check)
print("WARNINGS:%s" % passport_check.report_warnings)

Regards!

Arg0s1080 commented 5 years ago

@ShadabShariff maybe so be better:

import mrz.base.string_checkers as check
from mrz.checker.td3 import *
from mrz.checker._honorifics import titles
from mrz.base.functions import hash_is_ok
from string import ascii_uppercase

class PassportINDCodeChecker(TD3CodeChecker): 

    @property
    def optional_data_hash(self) -> bool:
        """Return True if hash of optional data is True, False otherwise."""
        if check.is_empty(self._optional_data) and self._optional_data_hash == "<":
            ok = True
        else:
            ok = hash_is_ok(self._optional_data, self._optional_data_hash)
        return self._report("optional data hash", ok)

    @property
    def identifier(self) -> bool:
        """Return True is the identifier is validated overcoming the checks, False otherwise."""
        full_id = self._identifier.rstrip("<")
        padding = self._identifier[len(full_id):]
        id2iter = full_id.lstrip("<<").split("<<") if full_id[2] in ascii_uppercase else full_id.split("<<")
        id_len = len(id2iter)
        primary = secondary = None
        if not check.is_printable(self._identifier):
            ok = False
        elif check.is_empty(self._identifier):
            self._report("empty identifier", kind=2)
            ok = False
        else:
            if id_len == len([i for i in id2iter if i]):
                if id_len == 2:
                    primary, secondary = id2iter
                    ok = True
                elif id_len == 1:
                    primary, secondary = id2iter[0], ""
                    self._report("only one identifier", kind=1)
                    ok = not self._compute_warnings
                else:
                    self._report("more than two identifiers", kind=2)
                    ok = False
            else:  # too many '<' in id
                self._report("invalid identifier format", kind=2)
                ok = False
        # print("Debug. id2iter ............:", id2iter)
        # print("Debug. (secondary, primary):", (secondary, primary))
        # print("Debug. padding ............:", padding)
        if ok:
            if not full_id.startswith("<<"):
                self._report("identifier doesn't starts with '<<'", kind=2)
                ok = False
                # If you want to report as warning instead of as error uncomment lines below
                # self._report("identifier doesn't starts with '<<'", kind=1)
                # ok = False if self._compute_warnings else ok
            if check.uses_nums(full_id):
                self._report("identifier with numbers", kind=2)
                ok = False
            if primary.startswith("<") or secondary and secondary.startswith("<"):
                self._report("some identifier begin by '<'", kind=2)
                ok = False
            if not padding:
                self._report("possible truncating", kind=1)
                ok = False if self._compute_warnings else ok
            for i in range(id_len):
                for itm in id2iter[i].split("<"):
                    if itm:
                        for tit in titles:
                            if tit == itm:
                                if i:  # secondary id
                                    self._report("Possible unauthorized prefix or suffix in identifier", kind=1)
                                else:  # primary id
                                    self._report("Possible not recommended prefix or suffix in identifier", kind=1)
                                ok = False if self._compute_warnings else ok
        return self._report("identifier", ok)

and then:

from mrz.checker.td3_india import PassportINDCodeChecker

mrz_code = ("P<IND<<AHMADI<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<\n"
            "K2578285<7IND5601240F2202288<<<<<<<<<<<<<<<4")
passport_check = PassportINDCodeChecker(mrz_code)

print("CHECK...:%s" % passport_check)
print("WARNINGS:%s" % passport_check.report_warnings)

P.S.: Read line 53

ShadabShariff commented 5 years ago

Hi @Arg0s1080

Thanks for pointing out the line 53, The Indian version of passport has the identifier starting with or without '<<'

We went with un-commenting the code which you pointed out, however just wanted to put up your code which we uncommented and commented for understanding and to know if it is fine :

if ok:
    if not full_id.startswith("<<"):
        # self._report("identifier doesn't starts with '<<'", kind=2)
        # ok = False
        # If you want to report as warning instead of as error uncomment lines below
        self._report("identifier doesn't starts with '<<'", kind=1)
        ok = False if self._compute_warnings else ok
    if check.uses_nums(full_id):
        self._report("identifier with numbers", kind=2)
        ok = False
    if primary.startswith("<") or secondary and secondary.startswith("<"):
        self._report("some identifier begin by '<'", kind=2)
        ok = False
    if not padding:
        self._report("possible truncating", kind=1)
        ok = False if self._compute_warnings else ok
    for i in range(id_len):
        for itm in id2iter[i].split("<"):
            if itm:
                for tit in titles:
                    if tit == itm:
                        if i:  # secondary id
                            self._report("Possible unauthorized prefix or suffix in identifier", kind=1)
                        else:  # primary id
                            self._report("Possible not recommended prefix or suffix in identifier", kind=1)
                        ok = False if self._compute_warnings else ok
return self._report("identifier", ok)

Also, just wanted to confirm if the name is changed there is no hash and checksum evaluation for the identifier for a given document id in general ?

Arg0s1080 commented 5 years ago

Hi again @ShadabShariff

Yes, perfect. Two lines above must be commented (I forgot it). If you dont want to report it as error (kind=2) or as warning (kind=1) just delete the block:

if not full_id.startswith("<<"):
        # self._report("identifier doesn't starts with '<<'", kind=2)
        # ok = False
        # If you want to report as warning instead of as error uncomment lines below
        self._report("identifier doesn't starts with '<<'", kind=1)
        ok = False if self._compute_warnings else ok

(it should work too)

No, the identifier never computes for checksums (Passports, Visas, etc). It's really curious. I think the same as you: it should have its own hash or at least compute for final hash.

Ups! In a previous comment I forgot 'install' param for installation using setup.py: python3 setup.py install