Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
Right now report include only: field name, data type, tags, semantic type id and registry URL.
Sometimes additional information required and it's collected during matching process.
Consider to add to report following data (already collected):
[x] number of unique values
[x] share of unique values
[x] minimal length
[x] max length
[x] average length
[ ] minimal value
[ ] maximum value
Consider to add and to collect following info:
[x] has alphas
[x] has digits
[x] has special chars
If possible, add following:
[ ] reconstucted regexp - regular expression reconstucted from data sample
[ ] named entities - named entities extracted by one of named entities detection tools like Microsoft Presidio or Slovnet or others
Right now report include only: field name, data type, tags, semantic type id and registry URL. Sometimes additional information required and it's collected during matching process.
Consider to add to report following data (already collected):
Consider to add and to collect following info:
If possible, add following:
reconstucted regexp
- regular expression reconstucted from data samplenamed entities
- named entities extracted by one of named entities detection tools like Microsoft Presidio or Slovnet or others