Change to CE datatype to support messages encountered in the wild

timdlm commented 5 years ago

I encountered some ORU_R01 messages arising from my lab system that could not be parsed. Trying to parse them raised InvalidName exceptions with the message e.g. "Invalid name for SubComponent: CE_7". I found that these messages had three extra elements in the CE fields of OBR segments (contrary to ANSI 2.5.1-2007). As a proof of concept I made the changes (diff below) to v2_5_1/datatypes.py and then these messages could be parsed successfully.

Is this a change you would accept?

diff --git a/hl7apy/v2_5_1/datatypes.py b/hl7apy/v2_5_1/datatypes.py
index 09e9f7a..158e1ae 100644
--- a/hl7apy/v2_5_1/datatypes.py
+++ b/hl7apy/v2_5_1/datatypes.py
@@ -29,6 +29,9 @@ DATATYPES = {
     'CE_4': ['leaf', None, 'ST', 'ALTERNATE_IDENTIFIER', None, -1],
     'CE_5': ['leaf', None, 'ST', 'ALTERNATE_TEXT', None, -1],
     'CE_6': ['leaf', None, 'ID', 'NAME_OF_ALTERNATE_CODING_SYSTEM', 'HL70396', -1],
+    'CE_7': ['leaf', None, 'ST', 'SECOND_ALTERNATE_IDENTIFIER', None, -1],
+    'CE_8': ['leaf', None, 'ST', 'SECOND_ALTERNATE_TEXT', None, -1],
+    'CE_9': ['leaf', None, 'ID', 'NAME_OF_SECOND_ALTERNATE_CODING_SYSTEM', 'HL70396', -1],
     'CF_1': ['leaf', None, 'ST', 'IDENTIFIER', None, -1],
     'CF_2': ['leaf', None, 'FT', 'FORMATTED_TEXT', None, -1],
     'CF_3': ['leaf', None, 'ID', 'NAME_OF_CODING_SYSTEM', 'HL70396', -1],
@@ -468,7 +471,10 @@ DATATYPES_STRUCTS = {
         ('CE_3', DATATYPES['CE_3'], (0, 1), 'CMP'),
         ('CE_4', DATATYPES['CE_4'], (0, 1), 'CMP'),
         ('CE_5', DATATYPES['CE_5'], (0, 1), 'CMP'),
-        ('CE_6', DATATYPES['CE_6'], (0, 1), 'CMP'),),
+        ('CE_6', DATATYPES['CE_6'], (0, 1), 'CMP'),
+        ('CE_7', DATATYPES['CE_7'], (0, 1), 'CMP'),
+        ('CE_8', DATATYPES['CE_8'], (0, 1), 'CMP'),
+        ('CE_9', DATATYPES['CE_9'], (0, 1), 'CMP'),),
     'CF': (
         ('CF_1', DATATYPES['CF_1'], (0, 1), 'CMP'),
         ('CF_2', DATATYPES['CF_2'], (0, 1), 'CMP'),

svituz commented 5 years ago

Hi @timdlm, I checked the v2.5.1 structures and CE has actually 6 components. We are aware that HL7 real world doesn't strictly follow the official structure (sigh!), so for this kind of situation we've decided to use two levels of validation. You can set the validation level to VALIDATION_LEVEL.TOLERANT when parsing or creating the message. This way you should be able to parse the message and then you can iterate the fields. You will be able to access the component up until CE_6but for the "unofficial" ones you will need to iterate.

Here is a little example snippet:

from hl7apy.core import Field
from hl7apy import VALIDATION_LEVEL as VL
f = Field('ABS_9', version='2.5.1', validation_level=VL.TOLERANT)
f.value = 'a^b^c^d^e^f^g^h^i^l^m^n'

NB: I've chosen ABS since it's the first CE field I've found :)

Now if you print CE_6 you get

>>> print(f.ce_6.value)
f

If you want to access the other ones you need to iterate over children

for c in f.children:
    print(c.value)
a
b
c
d
e
f
g
h
i
l
m
n

Of course, If you print f.value, you'll get a^b^c^d^e^f^g^h^i^l^m^n

Another hypothesis could be that you have a message profile that describes the message structure you're using (e.g., IHE-based message structures). In this case, you can check the documentation in message profile section

So, long story short, no, we would not accept the change :)

Hope that helps, Vittorio

timdlm commented 5 years ago

OK. Thanks, Vittorio!

timdlm commented 5 years ago

To follow up on one detail: passing validation_level=VALIDATION_LEVEL.TOLERANT to parse_message does not fix the issue. Indeed, TOLERANT is the default validation level (as returned by get_default_validation_level()).

svituz commented 5 years ago

You're right, TOLERANT is the default level. I've been able to reproduce the behavior. I'll have to check it more deeply to fix this.

crs4 / hl7apy

Change to CE datatype to support messages encountered in the wild #44