Kiyokawa / pydicom

Automatically exported from code.google.com/p/pydicom
0 stars 0 forks source link

Heuristic for unknown transfer syntax #85

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Hello,

first of all thanks for this module. I really appreciate the effort and 
enjoy working with it.

I encountered a problem reading a DICOM RT Ion Plan Storage file (UID 
1.2.840.10008.5.1.4.1.1.481.8). I have a set of DICOM files as output from 
a treatment planning system. The files provided are the RT Dose Storage, 
CT Image Storage, RT Structure Storage and the above mentioned RT Ion Plan 
Storage.
When reading for example the CT and dose data, everything works fine:

Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import dicom
>>> dicom.debug()
>>> 
>>> ct = dicom.read_file("CT.dcm")
Reading file 'CT.dcm'
Reading preamble
File is not a standard DICOM file; 'DICM' header is missing. Assuming no 
header and continuing
0008: (0008, 0005) None 000a                   'ISO_IR 100' 
001a: (0008, 0008) None 0018   'ORIGINAL\\SECONDARY\\AXIAL' 
003a: (0008, 0012) None 0008                     '20100428'
...
0072: (0008, 0016) None 001a '1.2.840.10008.5.1.4.1.1.2\x00'
...

>>> dose = dicom.read_file("DOSE.dcm")
Reading file 'DOSE.dcm'
Reading preamble
File is not a standard DICOM file; 'DICM' header is missing. Assuming no 
header and continuing
0008: (0008, 0005) None 000a                   'ISO_IR 100' 
001a: (0008, 0016) None 001e '1.2.840.10008.5.1.4.1.1.481.2\x00' 
...
0082: (0008, 0050) None 0000                             '' 
008a: (0008, 0060) None 0006                       'RTDOSE' 
...

I noticed and also confirmed with a hex editor, that the files do not 
have a header, but due to the fact that it worked I do not think that is 
the issue.
I tried to open the RTIPLAN and got the following output:

>>> plan = dicom.read_file("RTIPLAN.dcm")
Reading file 'RTIPLAN.dcm'
Reading preamble
File is not a standard DICOM file; 'DICM' header is missing. Assuming no 
header and continuing
0008: (0008, 0005) None a5343 'ISO_IR 100\x08\x00\x12\x00DA
\x08\x0020100428\x08\x00\x13\x00' 
>>> 
>>> data
(0008, 0005) Specific Character Set              CS: ['ISO_IR 
100\x08\x00\x12\x00DA\x08\x0020100428\x08\x00\x13\x00TM
\x06\x00142021\x08\x00\x14\x00UI\x12\x00
...

It looks like the whole file is read into the first tag. Matlab was able 
to read the file correctly with the data I expected. A correct output with 
dcmdump (DCMTK, OFFIS) is:

# Dicom-File-Format

# Dicom-Meta-Information-Header
# Used TransferSyntax: UnknownTransferSyntax

# Dicom-Data-Set
# Used TransferSyntax: LittleEndianExplicit
(0008,0005) CS [ISO_IR 100]                             #  10, 1 
SpecificCharacterSet
(0008,0012) DA [20100428]                               #   8, 1 
InstanceCreationDate
(0008,0013) TM [142021]                                 #   6, 1 
InstanceCreationTime
...
(0008,0016) UI [1.2.840.10008.5.1.4.1.1.481.8]          #  30, 1 
SOPClassUID
...(0008,0020) DA (no value available)                     #   0, 0 
StudyDate
(0008,0030) TM (no value available)                     #   0, 0 StudyTime
(0008,0050) SH [1]                                      #   2, 1 
AccessionNumber
(0008,0060) CS [RTPLAN]                                 #   6, 1 Modality
...

The hex output looks as follows:

00000000  08 00 05 00 43 53 0a 00  49 53 4f 5f 49 52 20 31  
|....CS..ISO_IR 1|
00000010  30 30 08 00 12 00 44 41  08 00 32 30 31 30 30 34  |
00....DA..201004|
00000020  32 38 08 00 13 00 54 4d  06 00 31 34 32 30 32 31  |
28....TM..142021|
00000030  08 00 14 00 55 49 12 00    |....UI..

I do not know if anything similar is already known, or if the file is 
simply corrupted. But since the other files work. If it is necessary I can 
provide the DICOM files, but I need to anonymise them first.

Thanks in advance for looking into this problem.

Cheers,
Andy

Original issue reported on code.google.com by sticktot...@googlemail.com on 30 Apr 2010 at 9:48

GoogleCodeExporter commented 8 years ago
I think I see the problem -- with no file meta info to give the transfer 
syntax, pydicom assumes implicit VR little 
endian, but that file is explicit VR (your matlab output indicates it used 
explicit VR, and the VR characters are 
there in the hex output).

So we don't need any example files, I can create a test case based on your hex 
output.

I've changed the title of this issue to reflect the more general problem.

As a temporary solution to read this RT Ion file, in filereader.py you could 
add an extra optional argument to 
read_file, passed along to read_partial, to tell the reader the file is 
explicit VR. I'll see if I can work up something 
like that to add to these functions. Even if there is a good heuristic, there 
may still be cases it doesn't handle 
properly, and the user should be able to force the correct one.

Original comment by darcymason@gmail.com on 30 Apr 2010 at 1:35

GoogleCodeExporter commented 8 years ago
Hi,

thanks for the quick answer. I was actually just reading up on implicit and 
explicit 
VR and can confirm your findings. All of the files in my output directory 
contain 
the DICOM data encoded with implicit VR, except for the one file with the RT 
Ion 
Plan (don't ask me why!).

I just edited my RT Ion Plan with a hex editor and deleted the first few VR's 
and 
made the encoding match implicit VR and I was able to read the first few 
elements 
with pydicom. Thanks for that. But yes, I agree. It would be nice to let the 
user 
choose the encoding when the transfer syntax is unknown.

I will contact the customer support and ask if there is a way to export the 
file 
meta info as well. This would probably be the most elegant way.

Thanks again!

Cheers,
Andy 

Original comment by sticktot...@googlemail.com on 30 Apr 2010 at 1:53