BCHSI / philter-ucsf

Open source clinical text de-identification
BSD 3-Clause "New" or "Revised" License
107 stars 50 forks source link

--outputformat "asterisk" not producing deidentified text #21

Open lesterlitch opened 10 months ago

lesterlitch commented 10 months ago

Pip installed, then ran the command as given in the readme:

python -m philter_ucsf -i ./data/i2b2_notes/ -o ./data/i2b2_results/ -f ./configs/philter_delta.json --prod=True --outputformat "asterisk"

And none of the PHI identified in tags have been replaced by asterisks.

This is because in main.py, on line 70, if the prod flag is set, output format is set to i2b2. Once I changed this to asterisk the phi was removed as expected.

Why do you need to set output format for prod = True?

An alternative would be to check if config provides an output format and if so use that, otherwise fallback to i2b2.