Restrict the personnummer slice of Patient.identifier.value to only allow digits? Add regex for all known identifier types used nationally?

vjernelov commented 1 year ago

Currently the personnummer slice for Patient.identifier.value is a string type which opens up for arbitrary use (for example 19121212-1212, 191212121212 or 121212-1212 or 1212121212) which could mean problems for applications and FHIR Servers alike.

Given that the PU service expects a strict YYYYMMDDXXXX format, I suggest we enforce the same rule/restriction for the personnummer slice as well in the base profile.

vjernelov commented 1 year ago

For reference, image captured from PU-tjänsten's TKB, version 4.0

vjernelov commented 1 year ago

A suggestion for a regex for this is:

^(?:19|[2-9]\d)\d{2}(?:0[1-9]|1[012])(?:0[1-9]|[1-2]\d|3[0-1])\d{4}$

danka74 commented 1 year ago

Maybe unlikely, but if e.g. health data from a cohort of patients where some are born before 1900 would be put on FHIR, this regexp would stop that. FHIR is one of the standards recognised by e.g. TEHDAS as a means to share data for secondary purposes...

johlju commented 1 year ago

A thought. Would it be better to not use regex, but instead pass the OID that reference the formatting? https://confluence.cgiostersund.se/display/PU/Identitetsformat

Then this would also support local replacement identification formats: https://confluence.cgiostersund.se/display/PU/Lokala+Reservidn

The application receiving/sending the information then knows how to parse the value (setting it and reading it).

vjernelov commented 1 year ago

Maybe unlikely, but if e.g. health data from a cohort of patients where some are born before 1900 would be put on FHIR, this regexp would stop that. FHIR is one of the standards recognised by e.g. TEHDAS as a means to share data for secondary purposes...

This is a valid point. According to https://www4.skatteverket.se/rattsligvagledning/edition/2023.1/330242.html we should also support individuals born in the 1800s. We should update the regex to reflect that if we agree this is a good idea.

vjernelov commented 1 year ago

A thought. Would it be better to not use regex, but instead pass the OID that reference the formatting? https://confluence.cgiostersund.se/display/PU/Identitetsformat

Then this would also support local replacement identification formats: https://confluence.cgiostersund.se/display/PU/Lokala+Reservidn

The application receiving/sending the information then knows how to parse the value (setting it and reading it).

My suggestion actually aligns well with this in that each SLICE would be getting its own regex. The advantage of describing the regex pattern in the actual profile is that validation of the resource automatically can be done using the HAPI library, meaning less work needs to be done by the implementors (if I understand the consequences of your suggestion correctly @johlju ).

The idea is that each type of patient identifier (personnummer, samordningsnummer, nationellt reservnummer) each would get their own regex pattern.

RikardLovstrom commented 1 year ago

Supporting people born in the 1800s seems like a good idea as long as we have historical data to deal with. Both the earliest EHRs and national registers contain people born in the 1800s. The oldest living person with a verified age is however born as late as 1907-03-04.

vjernelov commented 1 year ago

^(18|19|[2-9]\d)\d{2}(0[1-9]|1[012])([0-2]\d|3[0-1])\w{4} would satisfy the requirements, no? Ping @RikardLovstrom, @danka74 and @johlju

HL7Sverige commented 1 year ago

Why the w{4} in the end? - is this supposed to match something else than personnummer also?

vjernelov commented 1 year ago

You're right, should be d{4} for personnummer.

johlju commented 1 year ago

^(18|19|[2-9]\d)\d{2}(0[1-9]|1[012])([0-2]\d|3[0-1])\w{4} would satisfy the requirements, no? Ping @RikardLovstrom, @danka74 and @johlju

Maybe make it non-capturing groups: ^(?:18|19|[2-9]\d)\d{2}(?:0[1-9]|1[012])(?:[0-2]\d|3[0-1])\d{4}

johlju commented 1 year ago

The idea is that each type of patient identifier (personnummer, samordningsnummer, nationellt reservnummer) each would get their own regex pattern.

So if the identification number is a local replacment identiy, take for example the format yyyymmdd-XYZW that identifies a patient (see Skåne here https://confluence.cgiostersund.se/display/PU/Lokala+Reservidn). Would it be possible for the implementor to create.a new slice at runtime using the above format? Or does the Profile need to change, and re-published with a new slice for it to work? Not familiar with creating profiles, so read my question as such.

vjernelov commented 1 year ago

The idea is that each type of patient identifier (personnummer, samordningsnummer, nationellt reservnummer) each would get their own regex pattern.

So if the identification number is a local replacment identiy, take for example the format yyyymmdd-XYZW that identifies a patient (see Skåne here https://confluence.cgiostersund.se/display/PU/Lokala+Reservidn). Would it be possible for the implementor to create.a new slice at runtime using the above format? Or does the Profile need to change, and re-published with a new slice for it to work? Not familiar with creating profiles, so read my question as such.

@johlju So let's try to break down the answer to that question a bit:

The base profiles we create here under the HL7 Sweden umbrella are not intended to be used as is, but should instead be extended from, meaning you technically inherit all the qualities of the parent profile into your profile, which you then can restrict further.
The base profile allows for any kind of identifier to be added in a profile that extends the base profile. I suppose it's similar to in some way?
If Region Skåne have a requirement to validate their local temporary identifiers, they should create their own profile, extending the base profile, and addind that regex verification as part of their profile using the same pattern we've done here.

johlju commented 1 year ago

If Region Skåne have a requirement to validate their local temporary identifiers, they should create their own profile, extending the base profile, and addind that regex verification as part of their profile using the same pattern we've done here

Since in theory all regions should support all of the local temporary identifiers (when moving patients between regions), then all systems that handle that patient would need to extend the profile to support different/all local temporary identifiers that Inera Personuppgiftstjänsten handles. This sounds like a potential issue when different vendors might do it differently.

In this case wouldn't it be better to have a slice where we can put the identificationnumber and another field where we put the OID that says how to format/or how to evaluate the identification number. Downside is of course that each system need to incorporate the regex for each identification number they should support, and also handle OIDs that is not supported. But at the same time the system would otherwise need to extend the profile and handle all logic with that anyway (as mentioned above). By adding the OID the profile the profile could be used by all out-of-the-box?

vjernelov commented 1 year ago

Hmm, I wasn't aware that PU-tjänsten had explicit descriptions of the local identifiers they support, including regexes!:

https://confluence.cgiostersund.se/display/PU/Lokala+Reservidn

This opens up the possibility to add even more identifier types to the base Patient profile, thus enabling profiles downstream to limit the amount of "own" work. I think that sounds like a good way forward, what do you say @johlju ? Would that address your concerns?

johlju commented 1 year ago

Well it would allow all to use the existing identifiers, but to use any additional that is added the base profile must be re-published. My suggestion by passing an OID instead of a regex we only need one slice (?) which would support all types of identifiers, present and future.

Not really seeing by have the regexes in the base profile is that beneficial.

vjernelov commented 1 year ago

Well it would allow all to use the existing identifiers, but to use any additional that is added the base profile must be re-published. My suggestion by passing an OID instead of a regex we only need one slice (?) which would support all types of identifiers, present and future.

Not really seeing by have the regexes in the base profile is that beneficial.

I think we have different understandings of a couple of things related to this.

First of all, it's not a question of "regex vs URI (OID)", it's "regex AND URI (OID)" vs "only URI (OID)". The system element of the Identifier datatype already has the URI (OID) covered.
Secondly, the base profiles aren't supposed to be used as-is. They should cover the main use cases/needs that we typically can expect a majority would need. That means some profiling work is ALWAYS expected once you get close to implementation. The way the base profiles are designed, they allow any profile that is derived from them to add their own content. Another way of saying that is that the base profile never forces anyone to use anything, nor does it prohibit anyone from using anything. It simply says that "Hey, if you want to represent this concept (such as a Swedish personnummer), we've done that work for you.".
Thirdly, what regex in the profiles brings to the table that OIDs (or URIs) only doesn't, is that it allows implementors using the HAPI library to get a validation of the syntax for the regexed strings for free using the $validate operation. It also means that implementators don't need to write the same kind of syntax checks themselves (thus risking complications when people do different interpretations of what for instance a Swedish personnummer looks like)
Fourthly(?), I think the main question is if the base profiles should take local identifiers/requirements under their wings or not. We need to be clear about the inclusion/exclusion criteria for the content in base profiles in general.

vadi2 commented 1 year ago

On point 4 - there is value in doing that. For an example have a look at Australian Base profiles which have many profiles on the Identifier datatype for various identifiers used in the country:

Thanks to this list, it is easy to 'mix and match' these off the shelf definitions in your own profiles. The Australian Core Patient profile does this, for example:

vjernelov commented 1 year ago

I took the liberty of showcasing how this can be done in the branch attached to this issue. Please have a look when you have the time to see if you like the approach.

vjernelov commented 8 months ago

@larbo4 kolla med PU-tjänsten så att den OID-lista, de identitetstyper samt de regex som finns beskrivna på https://confluence.cgiostersund.se/display/PU/Lokala+Reservidn är "det senaste".

https://github.com/HL7Sweden/basprofiler-r4/blob/51-restrict-the-personnummer-slice-of-patientidentifiervalue-to-only-allow-digits-add-regex-for-all-known-identifier-types-used-nationally/input/fsh/patient.fsh leder er till förslaget på Patient-profilen. Raderna 90-102 innehåller OIDer och URIer för identitetstyperna.

På https://github.com/HL7Sweden/basprofiler-r4/blob/51-restrict-the-personnummer-slice-of-patientidentifiervalue-to-only-allow-digits-add-regex-for-all-known-identifier-types-used-nationally/input/fsh/invariants/PatientIdentifierConformancePatternSE.fsh finns de olika regex som används sammanställda. Om PU-tjänsten har regex beskrivna för de identitetstyper som stöds kan ni kolla dessa mot de som finns listade och sammanställa en eventuell "gap-rapport".

Niclas - kolla om regexdeklarationer via invariants stöds i olika tekniska implementationer av HL7 FHIR.

tineri1 commented 7 months ago

@larbo4 kolla med PU-tjänsten så att den OID-lista, de identitetstyper samt de regex som finns beskrivna på https://confluence.cgiostersund.se/display/PU/Lokala+Reservidn är "det senaste".

https://github.com/HL7Sweden/basprofiler-r4/blob/51-restrict-the-personnummer-slice-of-patientidentifiervalue-to-only-allow-digits-add-regex-for-all-known-identifier-types-used-nationally/input/fsh/patient.fsh leder er till förslaget på Patient-profilen. Raderna 90-102 innehåller OIDer och URIer för identitetstyperna.

På https://github.com/HL7Sweden/basprofiler-r4/blob/51-restrict-the-personnummer-slice-of-patientidentifiervalue-to-only-allow-digits-add-regex-for-all-known-identifier-types-used-nationally/input/fsh/invariants/PatientIdentifierConformancePatternSE.fsh finns de olika regex som används sammanställda. Om PU-tjänsten har regex beskrivna för de identitetstyper som stöds kan ni kolla dessa mot de som finns listade och sammanställa en eventuell "gap-rapport".

Niclas - kolla om regexdeklarationer via invariants stöds i olika tekniska implementationer av HL7 FHIR.

Det finns en uppdaterad lista med lokala reservidn, den hittar ni numera här: Lokala Reservid - Öppen info: Personuppgiftstjänsten - Confluence (atlassian.net)

Det finns 5 reservidn i listan som inte fanns med i den tidigare versionen: Region Västerbotten Region Halland Region Gävleborg Region Dalarna Och ytterligare en för VGR

Regex som finns beskrivna på confluencesidan för övriga överensstämmer med de som finns här: https://github.com/HL7Sweden/basprofiler-r4/blob/51-restrict-the-personnummer-slice-of-patientidentifiervalue-to-only-allow-digits-add-regex-for-all-known-identifier-types-used-nationally/input/fsh/invariants/PatientIdentifierConformancePatternSE.fsh Dvs inga skillnader utöver de fem tillagda reservid enligt ovan.

vjernelov commented 7 months ago

Det verkar finnas utmaningar kring generaliserbarheten av regex när vi skriver dessa i FSH. Niclas fick tips från Vadim om Regex101 som kan användas för att varje språk/implementation ska kunna göra sin egen representation av regex. Vi behöver förstå mer på djupet vad detta får för konsekvenser. Ser vi att regexet som genereras av SUSHI utifrån FSH inte är generellt tolkningsbart för samtliga relevanta implementationstekniker (Java, .Net, C##, Rust, PHP osv) så måste vi anamma en "mjukare" väg här.

vjernelov commented 5 months ago

Efter att ha kollat vidare på detta har det konstaterats att den typ av expressions vi använder för att säkerställa formatet på identiteterna inte bör innebära några problem ur ett implementationsperspektiv.

claudiaehr commented 1 month ago

@vjernelov Jag har läst hela konversationen men är inte säker att jag är med på vad ni har landat i, har ni lagt till regex för personnummer enligt förslag längre upp (i så fall vilken då det listades olika alternativ, dessutom togs även regex upp för samordningsnummer och nationellt reservnummer, har de lagts till) eller inte?

PatientSEVendorLite har en regex för personnummer och samordningsnummer, ni kanske kom fram till att samma ska användas i basprofilen, se https://commonprofiles.care/fhir/1.0.1/StructureDefinition-PatientSEVendorLite.html? Är det tänkt att man kunde använda regexen för nationellt reserv id som specas på https://inera.atlassian.net/wiki/spaces/PU/pages/3353216812/Nationellt+Reservid?

vjernelov commented 1 month ago

@vjernelov Jag har läst hela konversationen men är inte säker att jag är med på vad ni har landat i, har ni lagt till regex för personnummer enligt förslag längre upp (i så fall vilken då det listades olika alternativ, dessutom togs även regex upp för samordningsnummer och nationellt reservnummer, har de lagts till) eller inte?

PatientSEVendorLite har en regex för personnummer och samordningsnummer, ni kanske kom fram till att samma ska användas i basprofilen, se https://commonprofiles.care/fhir/1.0.1/StructureDefinition-PatientSEVendorLite.html? Är det tänkt att man kunde använda regexen för nationellt reserv id som specas på https://inera.atlassian.net/wiki/spaces/PU/pages/3353216812/Nationellt+Reservid?

All the actions we took under the Patient profile update project can be found if you open the Patient project on the Project page, and then press the "Complete" status (yeah I know, a bit weird). That will give you a more detailed description of the reasoning and outcome of all issues.

Specifically regarding the patient ID types, we have the following:

claudiaehr commented 1 month ago

Thank you @vjernelov Now I found your decision log and could see how you set the invariants under https://github.com/HL7Sweden/basprofiler-r4/blob/51-restrict-the-personnummer-slice-of-patientidentifiervalue-to-only-allow-digits-add-regex-for-all-known-identifier-types-used-nationally/input/fsh/invariants/PatientIdentifierConformancePatternSE.fsh

HL7Sweden / basprofiler-r4

Restrict the personnummer slice of Patient.identifier.value to only allow digits? Add regex for all known identifier types used nationally? #51