Using xs:id for Organization, Provider, and Facility identifiers

edjez commented 11 years ago

Summary

The xsd for the IHE CSD profile has ID types for identifiers. This seems to be a mis-application of the goal of xs:ID which is intended for identifying XML nodes in an XML document, not to tag data elements as semantic unique identifiers in some source dataset. (we refer to xs as the "http://www.w3.org/2001/XMLSchema" namespace)

The xs:id type renders a document invalid if more than one element across the whole document has the same ID, and is used to denote a 'key' by which to identify an XML node, not an indicator that in some system the data field is to be treated as a unique identifier).

Issue Description

The use of xs:id has the following problems:

From a high level perspective it places restrictions on content (facility IDs) based on API format choice (xml), which is not a good design sequencing, because other arbitrary formats could have other restrictions.
It places restrictions on the format and types of facility identifiers can use. For example, Rwanda uses FOSA IDs which are numeric. xs:id identifiers cannot be numeric or start with a number. Sometimes misused xs:id restrictions are sidestepped by appending magic strings to numeric identifiers "FOSA123", but this is a 'patch' behavior that leads to other issues for API consumers as the string is not part of the identifier and needs to be communicated out of band.
It makes it impossible to have an organization, a provider, and/or a facility with the same ID in the same returned document. Because xs:id is used to identify XML nodes in a document, if an organization is identified as "PIH" then neither a provider or facility could be identified as such, anywhere whatsoever in the document, which breaks the goal of orthogonality of these data sets.
It makes it impossible for a return document to include more than one copy of the same facility object (this would point to other issues with API design). Examples impossible to implement that have been documented in the field include be responding to a query of facilities nested by providers, or retrieving more than one past version of a facility object in the same request.
XPath navigability and other putative benefits of using xs:id within an xml document can be attained with other constructs like xs:unique.
Recommendation

Remove xs:id from CSD schema definitions.

Then, if the IHE group really feels the XML documents should restrict identifiers within a container they could xs:unique elements appropriately scoped to the container element. Even then some scenarios above would be constrained if the scope of the xs:unique attributes are not carefully designed; and there is very little value to the client or implementer to specify that uniqueness in schema; so it could be left to a future version.

More Information

http://www.w3.org/TR/xml-id/ (Specifically http://www.w3.org/TR/xml-id/#processing)

djritz commented 11 years ago

The arguments are persuasive. The plan is to use the "identifier" complex type defined in CSD.xsd for all identifiers including the unique identifiers for each registry (e.g. facilityID). The use of xs:id presumed system-assigned GUIDs for all identifiers; this idea is being abandoned in favour of a more flexible approach.

The defined complex type for identifiers is intended to support a format for cross referencing identifiers that is compatible with the HL7 model. It is shown below:

<xs:complexType name="Identifiers">
    <xs:sequence>
        <xs:element name="entityID"
                    type="xs:ID" />
        <xs:element name="otherID"
                    type="xs:string" />
        <xs:element name="issuingAuthority"
                    type="xs:string"
                    minOccurs="0" />
        <xs:element name="type"
                    type="xs:string"
                    minOccurs="0" />
        <xs:element name="status"
                    type="xs:string"
                    minOccurs="0" />
    </xs:sequence>
</xs:complexType>

Such a model enables cross referencing between multiple identifiers and a single "base" entityID. As a convention, when used for the base ID itself, entityID and otherID could contain the same value.

ghost commented 11 years ago

Sorry for just dropping in out of the blue, this discussion caught my attention :)....

I like the idea of maintaining compatibility with HL7 defined identifier structures. Particularly II (from HL7v3) and CX (from HL7v2). As many systems have already implemented v2 and/or v3 I think developers will be comfortable with the types. Derek: is there any plan to just use the definition for II (root,extension,assigningAuthority,use) and/or CX (identifier,assigningAuthority,use) in CSD or is the above identifier structure been agreed upon?

IMO duplicating data on the entityID and otherID attributes for internally assigned identifiers is a nuance that might frustrate some implementers.

Cheers!

djritz commented 11 years ago

Hi Justin -- I would welcome suggestions on explicitly re-using the v2 or v3 ID spec, unmodified. Do you have a snippet that can be dropped into the XSD? Also -- I've been trying to maintain an XSD structure that lends itself to being "disaggregated" (see Ed's comments regarding the fact that there may be multiple physical registries which are, together, supporting a single shared data model for interlinking). This has caused me to repeat the base ID, rather than inherit it, for things that need to be related. Are you suggesting that this is overkill regarding cross referenced (other) IDs?

ghost commented 11 years ago

I don't have the v2 XML datatypes handy but here are the relevant II types from v3 (nb: this is R1 not the ISO21090 harmonized types which have additional attributes which may be of use to CSD)

  <xs:simpleType name="uid">
    <xs:union memberTypes="oid uuid ruid"/>
  </xs:simpleType>
  <xs:simpleType name="oid">
    <xs:restriction base="xs:string">
      <xs:pattern value="[0-2](\.(0|[1-9][0-9]*))*"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="uuid">
    <xs:restriction base="xs:string">
      <xs:pattern value="[0-9a-zA-Z]{8}-[0-9a-zA-Z]{4}-[0-9a-zA-Z]{4}-[0-9a-zA-Z]{4}-[0-9a-zA-Z]{12}"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="ruid">
    <xs:restriction base="xs:string">
      <xs:pattern value="[A-Za-z][A-Za-z0-9\-]*"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:complexType name="II">
        <xs:attribute name="root" type="uid" use="optional"/>
        <xs:attribute name="extension" type="xs:string" use="optional"/>
        <xs:attribute name="assigningAuthorityName" type="xs:string" use="optional"/>
        <xs:attribute name="displayable" type="xs:boolean" use="optional"/>
 </xs:complexType>

Typically how cross referencing is handled by a v3 system is as a list of identifiers, here is an example from our PIX manager in v3 illustrating the available identifiers for a patient:

<subject1 typeCode="SBJ">
                     <patient classCode="PAT">
                        <id root="1.3.6.1.4.1.33349.3.1.3.201303.0.0.0" extension="15" assigningAuthorityName="A4H_OSCAR_A"/>
                        <id root="1.3.6.1.4.1.33349.3.1.3.201303.0.0.1" extension="115" assigningAuthorityName="A4H_OSCAR_B"/>
                        <id root="1.2.840.114350.1.13.99998.8734" extension="3051997106" assigningAuthorityName="GHHS"/>
                        <id root="1.3.6.1.4.1.33349.3.1.100.2012.1.1.4" extension="992842-121125-1985L" assigningAuthorityName="MOH_CAAT_ENT"/>
                        <id root="1.3.6.1.4.1.33349.3.1.2.2.0.0" extension="222" assigningAuthorityName="MOH_CAAT_CR"/>
                        <statusCode code="active"/>
                        <patientPerson classCode="PSN" determinerCode="INSTANCE">
                           <name use="L">
                              <family partType="FAM">LEE</family>
                              <given partType="GIV">ROSE</given>
                           </name>

Basically ROSE LEE has 5 identifiers assigned to her from different systems (provider organizations who are custodian of that identifier). However at the affinity domain level, each system is configured to only use the identifier created by our assigning authority (MOH_CAAT_ENT) when communicating. This is configurable and can be changed at deployment time, but the premise is the same. I assume the same can be done with facilities (it is supported in v3 messages and I believe v2 messages that facilities can have 1..* identifiers).

I don't think there is anything wrong with the structure you've defined, as a matter of fact I bet most developers wouldn't mind. Admittedly I am a little bit of a data nerd (and thus a stickler for normalization) so it just jumped out at me ... I had the same reaction to the XD* meta-data, but I still implemented it ;)

djritz commented 11 years ago

Thanks, Justin. I'm actually concerned with being able to indicate that there is a single agreed ID, for which there may be alternate IDs that might cross reference to it. So I want to maintain the ELID, and then say there is a list of otherID's, of type identifier, that are related to this ELID. it is not clear in the patient ID list which is the ECID.

Sorry if I'm missing something obvious...

ghost commented 11 years ago

The concept of an ECID isn't defined in PIX, rather we interpret one of the OIDs to be interpreted by the ECID (via configuration) to support our architecture (a federated affinity domain). For example there is a bit of code on our test harness that looks like this:

string ecid = ConfigurationManager.GetECID();
II id = subject1.Id.Find(o=>o.Root == ecid);

This works because there is a rule in PIX that says only one system may ever be the custodian of any one identification domain. So the OID used by the PIX manager to generate patient identifiers can't be used by any other system (the PIX manager rejects the message if a system attempts to do this).

In other architectures (for example point-to-point XDR) a system can be configured to use the same PIX messages to lookup the identifier of the document recipient (i.e. how does system X know this person?). Again the rule that only one system may be the custodian for any one OID.

PIX might be a little simplistic compared to CSD as PIX is a standalone registry with no hard and fast rule for deployment (even though in practice deployments are pretty similar). If I remember correctly CSD does include a little more in-depth discussion on architecture and deployment.

There may be a way to use a slightly different structure to perform this functionality (although in the patient world <asOtherIds> has a different semantic meaning than <id>):

<id root="1.2.3.4.5" extension="1234" assigningAuthority="GOVT"/>
<asOtherIds>
    <id root="1.2.3.4.6" extension="12345" assigningAuthority="FAC_MAN"/>
</asOtherIds>
<asOtherIds>
   <id root="1.2.3.4.8" extension="34-40493" assigningAuthority="ORG_MAN"/>
...

I will take a look into the PRLO domain from CHI to see how they've modeled it. Apologies if I've taken this discussion off track ...

Cheers!

djritz commented 11 years ago

Thanks Justin -- not off track at all. CSD is all about interlinking registries. This is made doable through a set of definitive IDs for organization, facility, service and provider. The fact that there can be alternate IDs is supported, of course. But for cross referencing, the definitive ones are needed. That is why the current XSD contains a single element for the definitive facilityID and a complexType that can list the other IDs. Cross references are then defined using this definitive ID.

facilityregistry / ihe