8400TheHealthNetwork / HebSafeHarbor

Hebrew PHI identification and redaction toolkit
MIT License
16 stars 4 forks source link

Update the HOSPITALS file #21

Open edengby opened 1 year ago

edengby commented 1 year ago

If I want to add values to the HOSPITALS, how it should be done? (hebsafeharbor/lexicons/hospital_lexicon.py) I noticed that the code is not referenced. How come? I tried running the docker-compose-development build and up but it did not change the list to recognize the new values.

omri374 commented 1 year ago

Looks like the hospitals lexicon isn't used by default. I would suggest to add it to the PhiIdentifier class similarly to how diseases and medications are handled. First, add it to the imports here: https://github.com/8400TheHealthNetwork/HebSafeHarbor/blob/4a0ba72e0f0ab5e8421f7b45637c463decaddbe3/hebsafeharbor/identifier/phi_identifier.py#L30

Then, instantiate it similar to this: https://github.com/8400TheHealthNetwork/HebSafeHarbor/blob/4a0ba72e0f0ab5e8421f7b45637c463decaddbe3/hebsafeharbor/identifier/phi_identifier.py#L163

Hope this helps

edengby commented 1 year ago

Thank you for the quick response. If I'm not mistaken, the diseases and medications are not anonymized by default. What else should be changed? Example of run (took from lexicons files):

גדעון לבנה הגיע היום לבית החולים שערי צדק עם תלונות על כאבים בחזה קיבל אנטיטריפסין והרספטין בנוסף אובחן עם רגל שטוחה ללא היסטוריה קודמת

image
omri374 commented 1 year ago

I haven't tested it but adding it to the ner_signals should be adding it to the detection pipeline. Perhaps the confidence score is too low and causing some results to be removed?