aphp / eds-pseudo

EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports
https://aphp.github.io/eds-pseudo
Other
44 stars 5 forks source link

Feature request: [feature] RPPS anonymization #14

Open Pierre-Auguste-Beaucote opened 2 months ago

Pierre-Auguste-Beaucote commented 2 months ago

Feature type

Additional entity

Description

šŸ‘‹ Congrats for the repo, it is a crucial topic ! Do you plan on adding RPPS (doctors national identifiers) in the entity list ?

Preventing health professional's identification in health data is increasing patient's privacy, and it is also protecting health professional's privacy !

percevalw commented 1 month ago

Hi @Pierre-Auguste-Beaucote ! We have not annotated RRPS ids in our private AP-HP dataset so we have currently not way of evaluating the RPPS matching performance on real data. This said, in our documents, most RPPS seem to follow the following format:

so adding a regular expression (e.g. in https://github.com/aphp/eds-pseudo/blob/main/eds_pseudo/pipes/pseudonymisation/patterns.py) should do the trick in most cases.

We can also annotate the fictitious templates and even add hard samples that would be difficult to match with a regular expression:

My doctor national identifier is the following:
10 00000 000

Do you have a use case in mind and/or some public/private documents that could be used to build/evaluate this matcher ?

Pierre-Auguste-Beaucote commented 1 month ago

Thanks for the quick answer, the RPPS is sometimes present on medical reports, always on prescriptions, referral letters, transportation vouchers, as well as most documents where there is a doctor's stamp.

Unfortunately I don't have such dataset !