Open Pierre-Auguste-Beaucote opened 2 months ago
Hi @Pierre-Auguste-Beaucote ! We have not annotated RRPS ids in our private AP-HP dataset so we have currently not way of evaluating the RPPS matching performance on real data. This said, in our documents, most RPPS seem to follow the following format:
RPPS = 10000XXXXX
NĀ° RPPS : 10000YYYYY
so adding a regular expression (e.g. in https://github.com/aphp/eds-pseudo/blob/main/eds_pseudo/pipes/pseudonymisation/patterns.py) should do the trick in most cases.
We can also annotate the fictitious templates and even add hard samples that would be difficult to match with a regular expression:
My doctor national identifier is the following:
10 00000 000
Do you have a use case in mind and/or some public/private documents that could be used to build/evaluate this matcher ?
Thanks for the quick answer, the RPPS is sometimes present on medical reports, always on prescriptions, referral letters, transportation vouchers, as well as most documents where there is a doctor's stamp.
Unfortunately I don't have such dataset !
Feature type
Additional entity
Description
š Congrats for the repo, it is a crucial topic ! Do you plan on adding RPPS (doctors national identifiers) in the entity list ?
Preventing health professional's identification in health data is increasing patient's privacy, and it is also protecting health professional's privacy !