fabiocaccamo / python-codicefiscale

:it: :credit_card: italian fiscal codes encoding, decoding and validation - codifica, decodifica e validazione del Codice Fiscale italiano.
MIT License
71 stars 24 forks source link

Wrong birthplace code error (missing date-range in the data-source). #113

Closed ncorona closed 1 year ago

ncorona commented 1 year ago

Python version 3.9

Package version 0.6.1

Current behavior (bug description) Hi! I got this error when decoding a valid codicefiscale value.

_Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.9/site-packages/codicefiscale/codicefiscale.py", line 440, in decode raise ValueError(f"[codicefiscale] wrong birthplace code: '{birthplacecode}'") ValueError: [codicefiscale] wrong birthplace code: 'G133'

I think some entry is missing in the municipalities.json file, in particular for the range from 1928-11-27 to 1948-05-07.

Upvote & Fund

Fund with Polar

fabiocaccamo commented 1 year ago

@ncorona thank you for reporting this.

Actually data is taken from https://www.anagrafenazionale.interno.it as it is probably the most reliable data-source, do you know a better alternative?

fabiocaccamo commented 1 year ago

@ncorona looking the data, it seems there is an hole between 1928 and 1948 for the code G133:

    {
        "active": false,
        "code": "G133",
        "date_created": "1861-03-17T00:00:00",
        "date_deleted": "1928-11-27T00:00:00",
        "name": "Ortacesus",
        "name_alt": "",
        "name_alt_trans": "",
        "name_slugs": [
            "ortacesus"
        ],
        "name_trans": "Ortacesus",
        "province": "CA"
    },
    {
        "active": false,
        "code": "G133",
        "date_created": "1948-05-07T00:00:00",
        "date_deleted": "1974-08-19T00:00:00",
        "name": "Ortacesus",
        "name_alt": "",
        "name_alt_trans": "",
        "name_slugs": [
            "ortacesus"
        ],
        "name_trans": "Ortacesus",
        "province": "CA"
    },

Could you provide the code you are trying to decode?

ncorona commented 1 year ago

@fabiocaccamo thank you for your fast reply! This is exactly what I was talking about, apparently the municipality is not associated with any code in that time frame. I checked the sources you pointed out to me and apparently there is missing data at the source (ISTAT)... And unfortunately G133 is not the only municipality with this problem! I was looking for other sources to be able to cross/integrate the data but I didn't find anything suitable :( I can't share the codes here for obvious privacy reasons but they are valid tax codes belonging to real people with valid

fabiocaccamo commented 1 year ago

@ncorona yes, the problem is that unfortunately there is not a perfect data source.

I can't share the codes here for obvious privacy reasons but they are valid tax codes belonging to real people with valid

No worries, I imagine... What is the year in your code with G133 ?

ncorona commented 1 year ago

Birth date falls in the missing range. Here some of the errors, with anonymized codes: [codicefiscale] wrong birthplace code: 'G133' / birthdate: '1939-04-04T00:00:00'. XXXXXX39D44G133X [codicefiscale] wrong birthplace code: 'F383' / birthdate: '1946-04-11T00:00:00'. XXXXXX46D11F383X [codicefiscale] wrong birthplace code: 'F383' / birthdate: '1950-06-18T00:00:00'.XXXXXX50H18F383X [codicefiscale] wrong birthplace code: 'L513' / birthdate: '1938-03-31T00:00:00'. XXXXXX38C71L513X [codicefiscale] wrong birthplace code: 'G133' / birthdate: '1943-03-14T00:00:00'. XXXXXX43C14G133X

fabiocaccamo commented 1 year ago

Ok, it's pretty obvious that a whole date-range is missing, thank you.

I try to look for some other good data-source, if you find some, feel free to paste some links here.

ncorona commented 1 year ago

@fabiocaccamo ok! Thanks for your time.

fabiocaccamo commented 1 year ago

@ncorona looking the original dataset it seems more that the cause of the missing date range are duplicated rows, as you can see the rows are identical except for the dates (I don't think that the municipality has been suppressed in that time frame and then re-created):

20589,"1861-03-17","1928-11-27","092081","G133","ORTACESUS","ORTACESUS","","",92,"092","20","","C","CA","","2016-06-17",""
20588,"1948-05-07","1974-08-19","092081","G133","ORTACESUS","ORTACESUS","","",92,"092","20","","C","CA","","2016-06-17",""
ncorona commented 1 year ago

@fabiocaccamo that's exactly the problem. It is very likely that there are no data for that time interval and I would like to ask whoever created the dataset the reason for this missing data... but I wouldn't know who to address the request to.

fabiocaccamo commented 1 year ago

@ncorona I think the only solution is to merge automatically the data fetched from https://www.anagrafenazionale.interno.it/ with a manually-managed .json file (as has already been done for deleted countries).

fabiocaccamo commented 1 year ago

@ncorona in 0.8.1 version the codes you reported are working correctly for the missing date-ranges.

Now the automatically-fetched municipalities data gets merged with the manually-managed patch data present in this file: https://github.com/fabiocaccamo/python-codicefiscale/blob/main/codicefiscale/data/municipalities-patch.json

So... until a perfect data-source will exist (I doubt that it will happen), the patching mechanism can be a valid workaround.