JohnSnowLabs / langtest

Deliver safe & effective language models
http://langtest.org/
Apache License 2.0
492 stars 36 forks source link

Add tests for De-Identification #954

Open dcecchini opened 8 months ago

dcecchini commented 8 months ago

We already have tests for NER models, but we should add a category for De-Identification so that we can test models for this specific capability.

We could extend it to test on clinical relevant de-identification tasks such as being compliant with HIPAA, etc.

chakravarthik27 commented 8 months ago

De-Identification examples

Category Test Type Original Test Case Expected Actual Pass
DeIdentification Simple Masking Patient John Doe was admitted on 01/01/2024. Mask names and dates PATIENT [MASK] was admitted on [MASK]. PATIENT [MASK] was admitted on [DATE]. True
DeIdentification HIPAA Compliance The patient's address is 123 Main St, Anytown. Mask address according to HIPAA The patient's address is [MASK] [MASK] [MASK]. The patient's address is [MASK] [MASK] CA. False
DeIdentification Redaction vs. Replacement The patient suffered from depression. Redact mental health conditions The patient suffered from [MENTAL_CONDITION]. The patient suffered from [REDACTED]. True