DICOM files without complete patient and study info can cause incomplete DICOM database records

Andras Lasso Potential solution: Save patient, study, series level fields (patient name, study secription, etc.) at the image level. This would allow storage of all information in the DB without any data loss At the end of any DB importing or editing go through all the updated patients and save the consolidated (appended descriptions, most recent non-empty patient name, etc.) values at the patient, study, series level

2012-12-14 11:06 pieper Can you point to a study that has this property? (i.e. a study where the patient data is different between two series in the same study). Ideally if you could attach two files that demonstrate this issue it would help with testing.

I think the best solution here is to refactor the insert method of ctkDICOMDatabase (it is long an hard to understand). As part of that we could better isolate code that deals with optimizing the insert mechanism and also doublecheck for inconsistencies in the data. Probably we should generate a warning for this situation, but also try to auto-correct it if the answer seems obvious (by obvious I mean that the patient name is blank for a given patient ID, but then later a non-blank name is provided).

2012-12-14 12:30 Andras Lasso All RT studies are like this: the CT series typically contains much more information than the RT dose, structure set, etc. series. Fields that are present in CT but usually missing from RT series: institution, referring physician, study description, ...

Specific example: https://subversion.assembla.com/svn/slicerrt/trunk/SlicerRt/data/pinnacle3-9.9-phantom-imrt/

CT: (0008,0020) DA [20070315] # 8, 1 StudyDate (0008,0030) TM [131213] # 6, 1 StudyTime (0008,0080) LO [PrincessMargaretHosp] # 20, 1 InstitutionName (0008,1030) LO [test] # 4, 1 StudyDescription

RTDOSE: (0008,0020) DA (no value available) # 0, 0 StudyDate (0008,0030) TM (no value available) # 0, 0 StudyTime

2012-12-14 12:35 Andras Lasso I think a good solution would be to store all the study information at the series level (because all the same fields of the same study can have different values in different series). In addition to this, a consolidated value (combined from all the series in the study) could be stored at the study level. The consolidation step could happen at the end of each import step (after all the files have been imported), just using the contents that are already in the database. 2012-12-14 14:21 pieper Andras: when you say that 'all RT studies are like this' do you mean all the research cases, or this is a standard practice in the clinic? If it is a standard convention, then we could implement a rule that ignores patient and study level information from all RT series because they are not reliable. On the other hand if this is more of an ad-hoc research convention then we should make the user manually reconcile the information. Either way it seems like invalid DICOM data, but something we need to accommodate cleanly. 2012-12-14 14:26 Andras Lasso These are DICOM files produced by performing usual RT workflows on phantom data. I think all files conform to the DICOM standard. I think loss-less storage of all fields at series level and storing composite values at study level is a good solution. What do you think? 2012-12-14 14:33 pieper Hmmm... I'm having trouble imaging a situation where the DICOM spec would endorse the idea of different patient-level information occurring within the same study. It's supposed to be a study of a particular person, so it seems like an error if things like patient name can change from series to series. (But the spec allows some odd things, so maybe we should consult someone more knowledgeable for advice). Keeping all the patient and study level data with each series seems like a lot of into to keep (and a complex schema to maintain). But we need to sort this out one way or another. I'm thinking that we might need a less rigid schema if we anticipate a lot of this kind of variability in the wild. 2012-12-14 15:02 Andras Lasso You can create a patient on different systems (CT scanner, treatment planning system, MRI scanner, ...) with as many details as you want: you may enter patient birth date on the CT but not enter it when you do the MRI scan. This is perfectly valid, DICOM compliant. However, this means that you cannot fill the study or patient level information from only one series, but you have to have all the data and then decide what you display for the user. For the consolidation of patient or study level field values into one displayable value you need to know the value stored in each series. So, you either have to re-read the values from the DICOM files on each insert and delete; or store all the values in the database. I think storing the values in the database is much more efficient and allows a simpler implementation of the consolidation mechanism. There is nothing complicated in the schema: you store all the values that you read from a file (or series) at the lowest level in the database (at file or series level). At higher levels nothing has to be changed, you compose the values from the low-level fields according to the compositing logic. The advantage of storing all the field values before consolidation is that you can re-run the consolidation whenever you want (e.g., if you install a new plugin or a new software version you don't have to re-build the database). The compositing logic could be just appending values with a "," separator, showing each value only once. Example: Series 1: Physician: A; Insitution: K Series 2: Physician: B; Institution undefined Series 3: Physician: A; Institution undefined => Shown at study level: Physician: A, B; Institution: K Asking the user to resolve conflicts or confirm merges can be quite tricky, because we would need to store all these confirmations persistently somewhere so that when the database is rebuilt these confirmations can be applied again. 2012-12-14 15:07 Andras Lasso I agree that we should double-check this with a DICOM guru before making a final decision. Maybe ask David Clunie or somebody from the DCMTK group? 2012-12-14 15:26 Andras Lasso Added a ticket on the CTK issue tracker: https://github.com/commontk/CTK/issues/276 2012-12-15 12:27 Csaba Pinter I think it would be very useful if we held a breakout session in SLC about these issues. We have quite a lot to discuss and decide in DICOM-related topics. (Andras could also attend over skype or hangout.) Steve, what do you think? 2012-12-17 13:42 pieper Yes, I think this is a great idea - something like the DICOM breakout we had in Boston at the summer meeting. We can be very focused though, and sort out this issue and maybe a few other high priority ones. 2012-12-17 14:27 pieper Hi Andras - Regarding the scenario you listed above, I just want to clarify. You said: "You can create a patient on different systems (CT scanner, treatment planning system, MRI scanner, ...) with as many details as you want: you may enter patient birth date on the CT but not enter it when you do the MRI scan. This is perfectly valid, DICOM compliant. However, this means that you cannot fill the study or patient level information from only one series, but you have to have all the data and then decide what you display for the user." In this case the MR and CT would have different Study UIDs because they are different studies. It's possible that the patient's name is misspelled or differently capitalized. In this case we currently treat these as different patients even if they have the same PatientID field (The same way that if they had the same name, but different PatientIDs they would be different patients). However the issue in the RT examples we have been looking at is that the CT and the treatment plan have the same Study UID which means that the RT data is based on the CT as it's reference data. Since the RT data is derived from the CT it's not clear to me that it should be allowed to have different Patient or Study level data while sharing the same Study UID. Again, I can easily see how this could happen in practice, but I don't think it's valid ;) 2012-12-17 16:33 Gabor Fichtinger This continues to be a hairy issue… Traditionally, it is true that “RT data is based on the CT as it's reference data” - however, this is expected to change, because adaptive RT plans will be based on multiple image data sources, from different sources (CT, MRI) throughout the course of series of treatment fractions. –Gabor From: pieper [mailto:slicerrt@alerts.assembla.com] Sent: December-17-12 2:35 PM 2012-12-17 18:16 Andras Lasso Yes, Steve, you are right that the patient creation on CT/MRI was not a good example, as the StudyInstanceUID would be different in that case. Anyway, there is inconsistent study information in CT and various RT IODs in data sets that we received from different hospitals, created with different treatment planning systems. I've also checked a couple of TPS DICOM conformance statements, and they all explicitly state the study fields that they store in the RT IODs. The stored fields include the mandatory fields and a few random optional fields - but several optional study fields are simply ignored. See: http://www.medical.siemens.com/siemens/en_GLOBAL/rg_marcom_FBAs/files/brochures/DICOM/syngovia/DCS_syngo_via_VA10A_Annex_RO_10.pdf http://www.elekta.com/dms/elekta/elekta-assets/Elekta-Software/pdfs/dicom-conformance-statements/software/LEDDCMXIO0001/XiO%C2%AE%20%7C%20Release%204.61%20%7C%20Document%20No%3A%20LEDDCMXIO0001.pdf http://www.nucletron.com/Resources/Documents/CIB-OSL192200%20Oncentra%20IMCON%20conf%20stat.pdf) You might consider this behavior "invalid", but this is what specified in the DICOM conformance statements of several FDA-approved devices.

SlicerRt / SlicerRT

DICOM files without complete patient and study info can cause incomplete DICOM database records #1