Open ewartj opened 2 years ago
Philter is essentially a whitelist approach. That means everything unknown is redacted by default. You would need to translate (or re-create) everything in the config file and it's patterns to the non-English language. Do-able, but a lot of work and would need testing of course.
Hi thanks for releasing this software. I was just wondering is there anyway of enabling Philter to process non-english text?
I had a quick try using default settings (python main.py -i ./data/i2b2_notes/ -o ./data/i2b2_results/ -f ./configs/philter_delta.json --prod=True --outputformat "asterisk") and it seems to anonymise everything by default. For example:
This:
Became:
Is there a way of modifying this so that only the regex patterns are anonymised?