ImagingDataCommons / highdicom

High-level DICOM abstractions for the Python programming language
https://highdicom.readthedocs.io
MIT License
168 stars 35 forks source link

Deprecate big endian transfer syntaxes #173

Closed CPBridge closed 2 years ago

CPBridge commented 2 years ago

Big endian transfer syntaxes are retired in the standard, so we should not support them. Furthermore, they considerably complicate working with certain value representations.

Now, passing a Big endian transfer syntax to the highdicom.SOPClass constructor will raise a ValueError. As this is a breaking change, there is a brief description in the release notes.

hackermd commented 2 years ago

Thanks @CPBridge. That looks good to me.

How shall we deal with a big endian transfer syntax upon decoding? In theory, we would need to check the file_meta information for every pydicom.Dataset passed to the from_dataset() factory method of any class derived from highdicom.SOPClass, wouldn't we?

CPBridge commented 2 years ago

It's a good question... I think that may be stricter than we need to be. I don't see why we couldn't allow parsing of attributes that are either: a) unaffected by endianess or b) handled by pydicom, which presumably does correctly account for endianess correctly (e.g. pixel data in segs and pms). Maybe we should just put checks on the methods and properties that would actually be affected, e.g. .lut_data? In fact I think these LUTs may be the only part of the decoding API that would be handling this incorrectly, apart from possibly something in the ann, module, which I'd have to double check. What do you think?

hackermd commented 2 years ago

I think it's a bit more complex. There are several binary attributes in Annotations, Parametric Maps, Softcopy Presentation States, Segmentation, etc. We would need to check any (current and potential future) use of numpy.frombuffer().

CPBridge commented 2 years ago

I searched the entire repo for frombuffer earlier and only found usage in pr and `ann. The other binary attributes are decoded by pydicom

hackermd commented 2 years ago

Values of some data element (e.g., ICC Profile) are decoded by other libraries and we are not sure whether they are (will be) handled correctly. Therefore, I would argue that we assert little endian when parsing a given data set.

hackermd commented 2 years ago

There is the complication that a pydicom.Dataset may not contain file_meta information, e.g., when it was constructed from DICOM JSON metadata. So there may be situations where we cannot check the transfer syntax. In that case, we may want to log a warning or debug (?) message and inform the caller that little endian will be assumed.

CPBridge commented 2 years ago

@hackermd I have implemented checks for endianess as you proposed on the subclasses of SOPClass that have a from_dataset method. It will log a warning as suggested if the transfer syntax, and hence endianess, cannot be determined.