Feature request: [feature] RPPS anonymization

Pierre-Auguste-Beaucote commented 2 months ago

Feature type

Additional entity

Description

👋 Congrats for the repo, it is a crucial topic ! Do you plan on adding RPPS (doctors national identifiers) in the entity list ?

Preventing health professional's identification in health data is increasing patient's privacy, and it is also protecting health professional's privacy !

percevalw commented 1 month ago

Hi @Pierre-Auguste-Beaucote ! We have not annotated RRPS ids in our private AP-HP dataset so we have currently not way of evaluating the RPPS matching performance on real data. This said, in our documents, most RPPS seem to follow the following format:

RPPS = 10000XXXXX
or N° RPPS : 10000YYYYY

so adding a regular expression (e.g. in https://github.com/aphp/eds-pseudo/blob/main/eds_pseudo/pipes/pseudonymisation/patterns.py) should do the trick in most cases.

We can also annotate the fictitious templates and even add hard samples that would be difficult to match with a regular expression:

My doctor national identifier is the following:
10 00000 000

Do you have a use case in mind and/or some public/private documents that could be used to build/evaluate this matcher ?

Pierre-Auguste-Beaucote commented 1 month ago

Thanks for the quick answer, the RPPS is sometimes present on medical reports, always on prescriptions, referral letters, transportation vouchers, as well as most documents where there is a doctor's stamp.

Unfortunately I don't have such dataset !

aphp / eds-pseudo

Feature request: [feature] RPPS anonymization #14

Feature type

Description