8400TheHealthNetwork / HebSafeHarbor

Hebrew PHI identification and redaction toolkit
MIT License
16 stars 4 forks source link

Latest improvements V0.0.10 #17

Closed admatis closed 2 years ago

admatis commented 2 years ago

✨New features and improvements • Consolidation improvements • Lexicon enhancements • City recognition enhancements: handling with ambiguous cities

🔴 Bug fixes • change the entity types of date recognizers to allow prediction with same boundaries from different recognizers (workaround analyzer remove duplicates) • ignore HebSpacy recognition in case that at least two recognizers recognize entity with same boundaries but different type • refine ambiguous cities list • add anonymization for url (now replaced by <_קשר>) • Avoid recognizing float numbers as TIME entity • "קיבוץ" was removed form cities lexicon (to avoid recognizing it as CITY entity)

👥 Contributors @admatis @omri374 @aya-bellicha @mnikahava