Open Removed-5an opened 5 years ago
Hi @5an1ty !
It seems that Belgium has 3 types of ID Cards:
The first one complies with ICAO specs (Belgian Citizens). The other two (Kids and Foreigners) work as you have explained, so It seems that Belgium is "twisting" the ICAO specifications.
According ICAO 9303-5 (TD1) 4.2.2.1, MRZ chars position will be 6 to 14 in line 1 for document number and 15 for document number hash, so this case is outside the scope of mrz. However, these problems can usually be solved simply by overwriting some property (see issue #3). In this case, overwriting document_number property.
A possible solution could be:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
from mrz.checker.td1 import *
from mrz.base.functions import hash_is_ok
class TD1BELCodeChecker(TD1CodeChecker):
@property
def document_number_hash(self) -> bool:
"""Return True if the hash of the document number is validated, False otherwise."""
if self._document_number_hash == "<":
doc_number_fin = self._optional_data.rstrip("<")
self._document_number = self._document_number + "<" + doc_number_fin[:-1]
self._document_number_hash = doc_number_fin[-1]
return self._report("document number hash", hash_is_ok(self._document_number, self._document_number_hash))
Usage:
from mrz.checker.td1_belgian import TD1BELCodeChecker
# CASE 1
code_citizens = ("IDBEL590330101085020100200<<<<\n"
"8502016F0901015BEL<<<<<<<<<<<8\n"
"VAN<DER<VELDEN<<GREET<HILDE<<<")
# CASE 2
mrz_code_kids = ("IDBEL000610035<7017<<<<<<<<<<<\n"
"0002015F0910190BEL000201002003\n"
"MAES<<SOPHIE<ANN<G<<<<<<<<<<<<")
td1_check_citz = TD1BELCodeChecker(code_citizens)
print("CASE 1:%s" % td1_check_citz)
td1_check_kids = TD1BELCodeChecker(mrz_code_kids)
print("CASE 2:%s" % td1_check_kids)
# CASE 3: Let's change document number hash
mrz_code_kids = ("IDBEL000610035<7010<<<<<<<<<<<\n"
"0002015F0910190BEL000201002003\n"
"MAES<<SOPHIE<ANN<G<<<<<<<<<<<<")
td1_check_kids = TD1BELCodeChecker(mrz_code_kids)
print("CASE 3:%s" % td1_check_kids)
print("FALSES CASE 3:")
print(td1_check_kids.report_falses)
Output:
CASE 1:True
CASE 2:True
CASE 3:False
FALSES CASE 3:
[('final hash', False), ('document number hash', False)]
This solution is valid for the 3 types of Belgian ID Cards. It's a very quick solution, so, I'm sure it can be improved. For example, if you want to report children and foreigners id cards as a warning:
@property
def document_number_hash1(self) -> bool:
"""Return True if the hash of the document number is validated, False otherwise."""
ok = True
if self._document_number_hash == "<":
doc_number_fin = self._optional_data.rstrip("<")
self._document_number = self._document_number + "<" + doc_number_fin[:-1]
self._document_number_hash = doc_number_fin[-1]
self._report("Possible Kids or Foreigners ID Card", kind=1)
ok = not self._compute_warnings
return self._report("document number hash",
ok and hash_is_ok(self._document_number, self._document_number_hash))
Output:
CASE 2:True
WARNINGS CASE 2:
['Possible Kid or Foreigner ID Card']
I hope I've helped.
Regards.
PS: I'm thinking that maybe it could be a good idea to create a folder to store all these special cases outside of ICAO specs
Hi, thank you for replying!
It helps me a lot! However your explanation is not fully correct.
I have verified 3 full Belgian eID cards (not kids or foreigners) and they also don't follow the actual ICAO 9303-5 (TD1) 4.2.2.1 spec. They have the same exception as the kids and foreigners cards like you describe above. I guess it's mostly newer cards that have a high enough document number.
It would be nice indeed to also support special cases and have them in another folder.
By the way: Unrelated to this issue, but it would be great if there was a function in your library that returns a dict of the parsed mrz.
Hi again!
I understand.. I'm from Spain and we also have 2 types of cards. In the old cards the national identification number is the document_number field, in the new cards that number is assigned to optional_data field and the document_number field is occupied by the number of the physical support of the cards (a real mess!)
I dont know if it's what you're looking for, but the library has several methods to report the result. For example, continuing with the previous example:
# CASE 3: Let's change document number hash
mrz_code = ("IDBEL000610035<7010<<<<<<<<<<<\n"
"0002015F0910190BEL000201002003\n"
"MAES<<SOPHIE<ANN<G<<<<<<<<<<<<")
td1_check = TD1BELCodeChecker(mrz_code)
print("CASE 3:%s" % td1_check)
print("\nList of tuples with all the fields analyzed:")
print(td1_check.report)
if bool(td1_check) == False:
print("\nList of tuples (same as above but only returns Falses):")
print(td1_check.report_falses)
print("\nList with errors:") # I've never liked it (it's possible that I can change or eliminate it)
print(td1_check.report_errors)
print("\nList with warnings:") # same as above
print(td1_check.report_warnings)
for field, result in td1_check.report:
print(field.title().ljust(30, "."), result)
Output:
CASE 3:False
List of tuples with all the fields analyzed:
[('final hash', False), ('document number hash', False), ('birth date hash', True), ('expiry date hash', True), ('document type format', True), ('valid country code', True), ('valid nationality code', True), ('birth date', True), ('expiry date', True), ('valid genre format', True), ('identifier', True), ('document number format', True), ('optional data format', True), ('optional data 2 format', True)]
List of tuples (same as above but only returns Falses):
[('final hash', False), ('document number hash', False)]
List with errors:
['false final hash', 'false document number hash']
List with warnings:
['Possible Kid or Foreigner ID Card']
Final Hash.................... False
Document Number Hash.......... False
Birth Date Hash............... True
Expiry Date Hash.............. True
Document Type Format.......... True
Valid Country Code............ True
Valid Nationality Code........ True
Birth Date.................... True
Expiry Date................... True
Valid Genre Format............ True
Identifier.................... True
Document Number Format........ True
Optional Data Format.......... True
Optional Data 2 Format........ True
hi Arg0s1080 thank u for ur nice code i have same problem in generating belguim id card mrz as u know Document number is 12 numbers and this app doesent accept it for ex 000590448 301 i tried to put ">301" in first optional data but first check number will go next to 8 ( it should be next to 1) IDBEL0005904480<301<<<<<<<<<<< this is check number whats ur idea about it how to generate belgium id card mrz code
hi Arg0s1080 thank u for ur nice code i have same problem in generating belguim id card mrz as u know Document number is 12 numbers and this app doesent accept it for ex 000590448 301 i tried to put ">301" in first optional data but first check number will go next to 8 ( it should be next to 1) IDBEL0005904480<301<<<<<<<<<<< this is check number whats ur idea about it how to generate belgium id card mrz code
Hi, whats up!
This issue was solved with a "special case". ItΒ΄s possible to check Belgian id cards with this class, but I think there is nothing to generate its mrz code.
I'm very busy right now. However let me re-study this issue again and when I have a little free time I will try to find a solution.
BR
Hi again @imanenter
Although the problem is not solved, i know how Belgian ID card 'mechanism' works.
Taking your picture and two from above:
from mrz.generator.td1 import TD1CodeGenerator
# 000590448 301
print(TD1CodeGenerator("ID", # Document type
"Belgium", # Country
"000590448", # Document number
"850101", # Birth date
"F", # Genre
"170203", # Expiry date
"Belgium", # Nationality
"Le Meunier", # Surname
"Jennifer Anne", # Given name(s)
"3016", # Optional data 1
"85010100200")) # Optional data 2
# 000610035 7017
print(TD1CodeGenerator("ID", # Document type
"Belgium", # Country
"000610035", # Document number
"000201", # Birth date
"F", # Genre
"091019", # Expiry date
"Belgium", # Nationality
"Maes", # Surname
"Sophie Ann G", # Given name(s)
"7017", # Optional data 1
"00020100200")) # Optional data 2
# B10032650 08
print(TD1CodeGenerator("ID", # Document type
"BEL", # Country
"B10032650", # Document number
"821020", # Birth date
"F", # Genre
"060131", # Expiry date
"New Zealand", # Nationality
"Flores", # Surname
"Gema Caroline J", # Given name(s)
"08", # Optional data 1
"82102008472")) # Optional data 2
I got this output
IDBEL00059044803016<<<<<<<<<<<
8501019F1702035BEL850101002007
LE<MEUNIER<<JENNIFER<ANNE<<<<<
IDBEL00061003507017<<<<<<<<<<<
0002015F0910190BEL000201002003
MAES<<SOPHIE<ANN<G<<<<<<<<<<<<
IDBELB10032650008<<<<<<<<<<<<<
8210209F0601315NZL821020084722
FLORES<<GEMA<CAROLINE<J<<<<<<<
The result is (almost) correct:
As you can see, it has only been necessary set document_number
with the first part, set optional_number_1
with the second part and force document_number_hash
with 0
string
It would only be necessary to disable document_number_hash
using <
string
All of this takes a long time. In another free time i will continue working with it
Regards
hi Arg0s1080 thank u so much for ur help yes it works thank u and best regards <3
Hi again @imanenter
Another weekend π...
There are many ways to solve the problem. I chose the way that I think is most correct.
There is still "polishing" some small detail to finish, but it is functional.
Taking your picture and two from above:
from mrz.special_cases.generator.belgium_id_card import TD1BELCodeGenerator
# 000590448 301
print(TD1BELCodeGenerator("ID", # Document type
"Belgium", # Country
"000590448 301", # Document number
"850101", # Birth date
"F", # Genre
"170203", # Expiry date
"Belgium", # Nationality
"Le Meunier", # Surname
"Jennifer Anne", # Given name(s)
"", # Optional data 1: This field is null. I still have to think what to do with it
"85010100200")) # Optional data 2
print()
# 000610035 701 7
print(TD1BELCodeGenerator("ID", # Document type
"Belgium", # Country
"000610035 701", # Document number
"000201", # Birth date
"F", # Genre
"091019", # Expiry date
"Belgium", # Nationality
"Maes", # Surname
"Sophie Ann G", # Given name(s)
"blahblah", # Optional data 1. Canceled
"00020100200")) # Optional data 2
print()
# B10032650 0 8
print(TD1BELCodeGenerator("ID", # Document type
"BEL", # Country
"B100326500", # Document number
"821020", # Birth date
"F", # Genre
"060131", # Expiry date
"New Zealand", # Nationality
"Flores", # Surname
"Gema Caroline J", # Given name(s)
"", # Optional data 1. CANCELLED
"82102008472")) # Optional data 2
Output:
IDBEL000590448<3016<<<<<<<<<<<
8501019F1702035BEL850101002007
LE<MEUNIER<<JENNIFER<ANNE<<<<<
IDBEL000610035<7017<<<<<<<<<<<
0002015F0910190BEL000201002003
MAES<<SOPHIE<ANN<G<<<<<<<<<<<<
IDBELB10032650<08<<<<<<<<<<<<<
8210209F0601315NZL821020084722
FLORES<<GEMA<CAROLINE<J<<<<<<<
Result is (totally) correct
As you can see above, document_number
field accepts 3 formats:
(With your sample)
"000590448301"
"000590448<301"
"000590448 301"
The hash is calculated automatically.
I also want to include the ability to add the hash manually:
"0005904483016"
"000590448<3016"
"000590448 3016"
BR
wooooowww thank u vvvveeeeery much, its cool i really appreciate it mannn
@Arg0s1080 Is there any other country that does not follow the TD1 format rather than Belgium..??
Hi there, @vamshi-7
The problem with TD1 format is that are used by countries as national Id cards, driver's licenses or other non-international documents, so, it's very probale that there are many countries that do not strictly comply with ICAO specs.That's why there are usually fewer problems with passports and visas.
Someone long ago reported a problem with German id cards and a special case was created, but surely Belgium and Germany are not the only countries that "break or twist" specs.
Why you ask?
BR
Hey @Arg0s1080 ,
Firstly, thank you for the reply. I am student from uni-koblenz, currently working as an intern. My research is on to extract the text from the travel docs. As far as now from my limited experience, all TD3 type docs are maintaining proper specs except Germany. I am confused with TD1 type after seeing this belgium cards.
But, many other countries can only break or twist the first line specs in the MRZ region? As I see, apart from Germany many other country are not twisting the specs w.r.t the second and third lines. Please correct me if am wrong.
Moreover, apart from google, any other open-sources to obtain this images dataset.
BR
Hi again,
I'm glad and I hope everything goes well for you!! In reality problems should not exist. The specs are unobjectionable (strict enough and flexible enough). Problems usually appear when "national data" is moved to document_number
, optional_data
and optional_data_2
fields (as in this issue of Belgium), but it is rare to find such problems in passports (TD3's) and visas (TD2's).
I highly doubt that you will find a good dataset to train a neural network or massively test a project. Think that it is private and very sensitive data (that's why this project has been in beta for years). I know there have been students who have used mrz.generator to train a NN, so I guess they didn't find a better option.
Why do you say that Germany does maintaning proper ICAO specs? Is it because of its country code ("D": only one letter) or another reason?
BR
Hi,
yes, It's difficult to find the data even to test the algorithm, especially for Belgium cards. And I mean about the Germany's country code specs.
Thanks and BR.
Given these two TD1 MRZ values:
IDSLV0012345678<<<<<<<<<<<<<<<
9306026F2708252SLV<<<<<<<<<<<4
JOHN<SMEAGOL<<WENDY<LIESSETTEF
And then another one,
IDSLVOO12345678<<<<<<<<<<<<<<<
9306026F2708252SLV<<<<<<<<<<<4
JOHN<SMEAGOL<<WENDY<LIESSETTEF
When scanned via OCR, it can read either 0012345678
or OO12345678
and still pass all check digits checks. Now, which is which?
Acccoding to lat edition of ICAO 9303, in Part 5, there is an explanation in how to compute the DV when the document number exceeds the original field size: https://www.icao.int/publications/Documents/9303_p5_cons_en.pdf Part 5. Specifications for TD1 Size Machine Readable Official Travel Documents (MROTDs) 4.2.4 Check digits in the MRZ The method of calculating check digits is given in Doc 9303-3. For the TD1, the data structure of the machine readable lines in Paragraph 4.2.2 provides for the inclusion of four check digits as follows: Check digit Character positions (upper MRZ line) used to calculate check digit Check digit position (upper MRZ line) Document number check digit 6 β 14 15 check digit or Long document number check digit 6 β 14, 16 β 28 Note: Position 15 contains β<β and is excluded from the check digit calculation. The position of the last digit of a long document number is in the range of 16 β 28. 17 β 18 (one digit only) Note: Since the check digit follows the last digit of the document number, its position is in the range of 17 β 29. The check digit is followed by β<β.
Hi there, thank you for making this library!
I have an issue with TD1, specifically scanning Belgian ID cards. If the document_number_hash digit is "<" the document will not verify.
I have checked this with 3 different Belgian ID cards and they all have "<" on index 14 of line 0.
After a ton of googling and reading specs I found an issue with the way you check document_number_hash...
Normally a document number starts at position 5 and ends at position 13 but sometimes a document number exceeds the size of it's slot and optional fields will be used, let's take a look at this example:
IDBEL123456789<1233<<<<<<<<<<<
In this case the document number check is < when we have a scenario like that we need to look at the optional numbers (1233). So when the document number check is < we need to look at the last none empty value: 3. This is the actual hash number. After that we simply verify the hash of:
Document Number: 123456789<123 Hash: 3
And this should verify as True using your verify function.