Enet4 / dicom-rs

Rust implementation of the DICOM standard
https://dicom-rs.github.io
Apache License 2.0
416 stars 81 forks source link

Failure when Explicit VR contains Implicit VR #521

Open naterichman opened 3 months ago

naterichman commented 3 months ago

So this is another one with an invalid dicom, which is a case I think should be handled.

This is one where a dicom comes in with ExplicitVRLittleEndian TS and some elements don't actually have VRs. The way the explicit VR decoder is doing that is by setting the VR to UN here which then causes it to read the two extra reserved bytes which may or may not be there depending on what the actual VR is supposed to be.

I've reproduced this by taking a basic pydicom test file, adding a private tag and manually deleting the VR bytes, so the section of the file looks like this:

00000316: 11 00 10 00 4c 4f 04 00                (0011, 0010) LO Length: 4
0000031e: 54 65 73 74                            b'Test'
00000322: 11 00 01 10 02 00 00 00                (0011, 1001) None Length: 2
0000032a: 31 32                                  b'12'

so this section of the file looks like this, which gives an error when I dicom-dump

➜  ~ $ dicom-dump ~/Downloads/mr_broken.dcm
/home/naterichman/Downloads/mr_broken.dcm:
Could not read data set token

Caused by these errors (recent errors listed first):
  1: Could not read 1585713 value bytes for element tagged (0011,1001)
  2: Could not read value from source at position 480
  3: failed to fill whole buffer

Part of the reason I think this should be supported is because pydicom supports it :wink: The way its done in pydicom is here where they first try to read explicit and then if the resulting VR is not alphabetical, they fall back to implicit.

I'm not sure how that would work with the design here and I fully understand that this library cannot be as dynamic as python, but I think at least the error should be clearer, i.e. if the VR is not present when reading when reading an explicit DS make that the warning instead of trying to parse the length from the next bytes.

Link to documents, I uploaded both the original and the one I edited to be broken: onedrive

Enet4 commented 3 months ago

Thank you for reporting. Can you validate that this is a duplicate of https://github.com/Enet4/dicom-rs/issues/169?

It's a rare scenario, but it's true that the parser currently misbehaves when entering nested sequences with the VR UN.

naterichman commented 3 months ago

Yep, it seems to be the same thing from what I can tell without having the original file of the issue author, except this is just an invalid element, not the small exception within sequences. I'm happy to take a crack at this one, except I don't think your proposed solution would work for this particular case

From the moment it enters a sequence of VR UN, it would remember to always use implicit VR little endian until it reaches the end of that sequence

since this isn't in a sequence. LMK your thoughts

Enet4 commented 3 months ago

So I see. This is something else all right!

It should be fine to introduce some simple logic there if it helps read that file to completion.