EdyVision / pii-codex

A research python package for detecting, categorizing, and assessing the severity of personal identifiable information (PII)
BSD 3-Clause "New" or "Revised" License
66 stars 8 forks source link

Error while detecting US SSN and US Bank account number. #34

Open HardKothari opened 1 year ago

HardKothari commented 1 year ago

It seems that when US_SSN is detected in the sentence it always errors to the below error message:

"Exception: An error occurred while processing the detected entity US_SSN"

Traceback:

  File "C:\Python_Local\Cerebro\cerebro-flask-api\venv\lib\site-packages\pii_codex\services\analysis_service.py", line 75, in analyze_item
    analysis, sanitized_text = self._perform_text_analysis(
  File "C:\Python_Local\Cerebro\cerebro-flask-api\venv\lib\site-packages\pii_codex\services\analysis_service.py", line 280, in _perform_text_analysis
Exception: An error occurred while processing the detected entity US_SSN

After looking closer in the code it seems that this entity type is missing from the csv attached in the data folder.

file: pii_mapping_util.py

    def __init__(self):
        self._pii_mapping_data_frame = open_pii_type_mapping_csv("v1")

file: file_util.py

    file_path = get_relative_path(
        f"../data/{mapping_file_version}/{mapping_file_name}.csv"
    )

The file contains PII_Type = "US_SOCIAL_SECURITY_NUMBER" instead of "US_SSN"

Same exception happens for bank number as well: Exception: An error occurred while processing the detected entity US_BANK_NUMBER

  File "C:\Python_Local\Cerebro\cerebro-flask-api\venv\lib\site-packages\pii_codex\services\assessment_service.py", line 21, in assess_pii_type
    return PII_MAPPER.map_pii_type(detected_pii_type)
  File "C:\Python_Local\Cerebro\cerebro-flask-api\venv\lib\site-packages\pii_codex\utils\pii_mapping_util.py", line 45, in map_pii_type
    raise Exception(
Exception: An error occurred while processing the detected entity US_BANK_NUMBER

The file contains PII_Type = "US_BANK_ACCOUNT_NUMBER" instead of "US_BANK_NUMBER".

Hope this helps.

Thank you

HardKothari commented 1 year ago

This also happens for US_Driver_licence. After modifying all these 3 in the csv file, the error goes away.

HardKothari commented 1 year ago

This is happening for AU_MEDICARE entity type as well.

EdyVision commented 1 year ago

Thanks for bringing that up. The common_types are referenced in the CSV, so in doing the lookup for the common type, the CSV reference is found, although as stated not all types are supported. The mapping of types is being changed in a future release.

xqrt commented 11 months ago

This also happens for US_Driver_licence. After modifying all these 3 in the csv file, the error goes away.

@HardKothari I assume you modified this : https://github.com/EdyVision/pii-codex/blob/main/pii_codex/data/v1/pii_type_mappings.csv

and how did you reload? (I'm using the notebook)

GP

HardKothari commented 11 months ago

I am using this locally on my pc and hence I just replaced the file in the library folder of my virtual environment.

I am not sure how would it work with notebook 😞