dcmjs-org / dcmjs

Javascript implementation of DICOM manipulation
https://dcmjs.netlify.com/
MIT License
292 stars 111 forks source link

Allow backslashes in UIDs in order to support C-FIND in dcmjs-dimse #277

Closed richard-viney closed 2 years ago

richard-viney commented 2 years ago

When using this library with C-FIND requests via dcmjs-dimse, it is necessary to allow backslashes in UIDs because the spec says it is possible to query a list of UIDs by using a backslash as a delimiter.

Without this change UID list queries aren't possible in dcmjs-dimse because the backslashes get stripped out.

pieper commented 2 years ago

I haven't looked at DIMSE in a long time, but I would think that you would have a VM of greater than one if you have multiple UIDs in a query, and that we should keep the UI VR strictly matching the standard. That is, I think the backslash would be parsed at a different level.

richard-viney commented 2 years ago

Yep the backslash separator isn't part of the DICOM standard itself, but dcmjs already supports it on other VR types, e.g. IS (integer string) and DS (decimal string), which is quite handy, so UID seems to be the odd one out here in not supporting it (there may also be others, I haven't done an exhaustive search).

pieper commented 2 years ago

I didn't write this part of the code originally, so I'm reverse engineering here too, but at least for the decimal string (DS) case it appears that rather than treating the backslash as part of the token it splits the string and returns a list of numbers. Wouldn't you want the same behavior for UID?

https://github.com/dcmjs-org/dcmjs/blob/37082a6966a4a5763cdd2cf6635a6456923dc19b/src/ValueRepresentation.js#L529-L543

richard-viney commented 2 years ago

The splitting on backslash is already being done here:

https://github.com/dcmjs-org/dcmjs/blob/37082a6966a4a5763cdd2cf6635a6456923dc19b/src/DicomMessage.js#L226

So simply letting the backslash through the UniqueIdentifier parsing like this PR does is sufficient. For the decimal and integer string cases it's different because they need to convert to an actual number, or array of numbers, so the splitting is done in the ValueRepresentation subclass.

pieper commented 2 years ago

I'm still not convinced we should treat a string with backslashes in it as a valid UID VR. Can you provide more information and a test or example use case in dicom-dimse where this is required?

richard-viney commented 2 years ago

Yep, certainly. I've compared to pydicom and it works in a similar way, i.e. it treats backslashes as UID delimiters in any DICOM content it parses. This isn't strictly correct, because it should really check against the VM (value multiplicity) of the tag it's parsing. In a standard DICOM file UID fields generally have a multiplicity of 1 (though a private field could have any multiplicity), and in the case of DIMSE the UID fields allow any amount of multiplicity. Doing this additional check against the tag's multiplicity would be a much larger change to dcmjs, and the only benefit is preventing multiplicity in those contexts where it isn't technically meant to be allowed. pydicom doesn't seem to have bothered with this, I think justifiably so, and just allows multiplicity anywhere, which is the approach dcmjs has also taken on DS and IS VRs, it just wasn't implemented for UID VRs.

The relevant part of the specs are:

https://dicom.nema.org/medical/dicom/current/output/chtml/part04/sect_C.2.2.2.2.html https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_6.4.html

I've updated the PR with these links to the spec, and added a test for reading a DICOM that has multiple values for UID and DS tags.

Thoughts?

richard-viney commented 2 years ago

Sure thing, the binary file has been removed and put here https://drive.google.com/file/d/180YkJ6cqHKm0RphxMGCeUR0aFsUNDL1u/view?usp=sharing