8400TheHealthNetwork / HebSafeHarbor

Hebrew PHI identification and redaction toolkit
MIT License
16 stars 4 forks source link

a few questions #13

Closed omrilidor closed 2 years ago

omrilidor commented 2 years ago
  1. in the example web page, i can see that city names are not masked, and only days are masked in dates. is there a way you send options variable to the hsh function so the user decides what to mask?
  2. after implementing your example code, the answer i see printed in the terminal is this: [<hebsafeharbor.common.document.Doc object at 0x0000028B95AE0B20>]. is there something im missing? ive pip installed all the requirements listed in the readme file
  3. is there a way to send a pandas data frame as input?
omri374 commented 2 years ago

Hi @omrilidor, specifically for 2, you can get the anonymized answer by calling:

from hebsafeharbor import HebSafeHarbor

hsh = HebSafeHarbor()

text = """שרון לוי התאשפזה ב02.02.2012 וגרה בארלוזרוב 16 רמת גן"""
doc = {"text": text}

output = hsh([doc])
print(output[0].anonymized_text.text)

Output:

<שם> התאשפזה ב<יום>.02.2012 וגרה <מיקום_> 16 רמת גן

Or if you'd like to get the full output from all the intermediate steps:

print(output[0].__dict__)
omri374 commented 2 years ago

For (3), you can convert texts in a pandas data frame to Docs and call HebSafeHarbor on a list of docs:

import pandas as pd
from hebsafeharbor import HebSafeHarbor

hsh = HebSafeHarbor()

# Data frame with texts
df = pd.DataFrame({"text":["יוסי כהן כיהן בתפקיד שנים רבות", "שרון לוי התאשפזה ב02.02.2012 וגרה בארלוזרוב 16 רמת גן"]})

# Translate text column to Docs:
docs = [{"text":text} for text in df.text]

# Call HebSafeHarbor
outputs = hsh(docs)

# Add the anonymized text to the data frame:
df['anonymized'] = [output.anonymized_text.text for output in outputs]

print(df)

Output:

                                                text                                         anonymized
0                     יוסי כהן כיהן בתפקיד שנים רבות                        <שם_> כיהן בתפקיד שנים רבות
1  שרון לוי התאשפזה ב02.02.2012 וגרה בארלוזרוב 16...  <שם_> התאשפזה ב<יום_>.02.2012 וגרה <מיקום_> 16...