Provide PixelData decompression (JPEG, RLE, MPEG, etc)

GoogleCodeExporter commented 9 years ago

pydicom can read JPEG files, but the pixel data remains compressed. Should
investigate incorporating JPEG code to allow at least decompression of read
images.

Original issue reported on code.google.com by darcymason@gmail.com on 8 Oct 2008 at 12:40

GoogleCodeExporter commented 9 years ago

More info: Most important ones to add first will be the ones that have not been
retired, and especially the default transfer syntaxes:

First, ultra-short first-line defaults list, then more details follow:
  * JPEG Lossless, Non-Hierarch, 1st-Order Prediction (Process 14 [Selection Value
1]) -- Default for Lossless JPEG Image Compression
  * JPEG 2000 Image Compression (Lossless Only)
  * JPEG Baseline (Process 1) -- Default for Lossy JPEG 8 Bit
  * JPEG Extended (Process 4) -- Default for Lossy JPEG 12 Bit
  * JPIP and JPIP deflate
  * RLE

Full details
(clipped from UID_dict)
  * 1.2.840.10008.1.2.4.50: JPEG Baseline (Process 1) -- Default Transfer Syntax for
Lossy JPEG 8 Bit Image Compression
  * 1.2.840.10008.1.2.4.51: JPEG Extended (Process 2 and 4) -- Default Transfer
Syntax for Lossy JPEG 12 Bit Image Compression (*Process 4 only*)
  * 1.2.840.10008.1.2.4.57: JPEG Lossless, Non-Hierarchical (Process 14)
  * 1.2.840.10008.1.2.4.70: JPEG Lossless, Non-Hierarchical, First-Order Prediction
(Process 14 [Selection Value 1]) -- *Default Transfer Syntax for Lossless* JPEG 
Image
Compression
  * 1.2.840.10008.1.2.4.80: JPEG-LS Lossless Image Compression
  * 1.2.840.10008.1.2.4.81: JPEG-LS Lossy (Near-Lossless) Image Compression
  * 1.2.840.10008.1.2.4.90: JPEG 2000 Image Compression (Lossless Only)
  * 1.2.840.10008.1.2.4.91: JPEG 2000 Image Compression'
  * 1.2.840.10008.1.2.4.92: JPEG 2000 Part 2 Multi-component  Image Compression
(Lossless Only)
  * 1.2.840.10008.1.2.4.93: JPEG 2000 Part 2 Multi-component  Image Compression
  * 1.2.840.10008.1.2.4.94: JPIP Referenced
  * 1.2.840.10008.1.2.4.95: JPIP Referenced Deflate
  * 1.2.840.10008.1.2.4.100:MPEG2 Main Profile @ Main Level
  * 1.2.840.10008.1.2.5': RLE Lossless

Original comment by darcymason@gmail.com on 12 Oct 2008 at 12:17

Changed title: Provide PixelData decompression (JPEG, RLE, MPEG, etc)
Added labels: Milestone-Release1.0, Priority-High
Removed labels: Milestone-Release1.5, Priority-Medium

GoogleCodeExporter commented 9 years ago

Tried to find JPEG libraries to decode pixel data (tried using PIL and libjpeg
directly) -- can do it for lossy 8-bit but other images tried (lossless, lossy
12-bit) did not work in PIL.
An alternative is to incorporate C code, but am trying to keep to pure python or
installation of one other package like PIL. So at this point, the amount of work
involved is not worth it, and not core to what pydicom is meant to do. 
Postponing
jpeg decompression for now.

Original comment by darcymason@gmail.com on 15 Oct 2008 at 12:57

Changed state: WontFix

GoogleCodeExporter commented 9 years ago

Reopening this issue. Medical image format FAQ has good info on libraries 
available. 
http://www.dclunie.com/medical-image-faq/html/part7.html#SourceJPEG

PVRG looks good (note link to PDF documentation in section linked to above, 
else is
latex docs with the libraries themselves). Perhaps could create a separate 
package to
wrap PVRG for python, have it as an optional install that pydicom could use if
available, maybe eventually merge it in (but would ideally like to keep pydicom 
as
pure python package; much simpler!).

Original comment by darcymason@gmail.com on 25 May 2009 at 2:26

Changed state: New
Added labels: Difficulty-Hard, Priority-Medium
Removed labels: Priority-High

GoogleCodeExporter commented 9 years ago

Found pure python Huffman encoding/decoding (part of JPEG process) at 
http://pypi.python.org/pypi/huffman%20encoder%20%26%20decoder/0.3. Has detailed 
explanation [1] and walk through of optimizing the code [2]. But is GPL (on 
PyPI 
page), so inclusion in pydicom would limit the more-free MIT license. 
His optimization didn't seem to look into Numpy at all, maybe there is more 
there. 
Maybe PVRG (very open license) could be "converted" to python (using Numpy in 
an 
optimized way to get reasonable speed).
[1] http://gpolo.ath.cx:81/misc/huffman/
[2] http://gpolo.ath.cx:81/texts/opc

Original comment by darcymason@gmail.com on 26 May 2009 at 3:44

GoogleCodeExporter commented 9 years ago

More research: code for DCT (also needed for JPEG I believe):
http://projects.scipy.org/scipy/ticket/733 -- proposed code for DCT using numpy 
fft
http://whiter4bbit.blogspot.com/2008/11/dct-python-implementation-warning-code.h
tml -
- DCT using Numpy; quite short.

Original comment by darcymason@gmail.com on 26 May 2009 at 4:08

GoogleCodeExporter commented 9 years ago

Just a suggestion -- libjpeg is widely installed in Unix/Linux land and is 
available
for Windows (although it's not installed by default). Why not make pydicom JPG
decompression conditional on libjpeg? 

The same goes for PNG/libpng.

Original comment by NikitaTh...@gmail.com on 5 Jan 2010 at 3:46

GoogleCodeExporter commented 9 years ago

>Just a suggestion -- libjpeg is widely installed in Unix/Linux land and is 
available
> for Windows (although it's not installed by default). Why not make pydicom JPG
>decompression conditional on libjpeg? 

I'd be happy to use it if it would work. I did look at this a long time ago 
(see 
comment from Oct 15, 2008), and only got it to work for 8-bit lossy images 
which 
isn't very complete given that 12 or 16 bit images are common, and there are 
many 
lossless techniques. It's a long time ago that I looked at this, but IIRC there 
were 
some newsgroup discussions about many issues that other libraries such as dcmtk 
and PVRG have had to deal with. I believe dcmtk started with libjpeg but had to 
modify it and add in other codes for the various JPEG flavours allowed by 
DICOM. It 
may make more sense to adopt their C code if any.

Original comment by darcymason@gmail.com on 6 Jan 2010 at 12:28

GoogleCodeExporter commented 9 years ago

For JPEG, how about the Jasper library?  I believe that OsiriX (Mac OS X DICOM 
viewer) uses it, and it's licensed under the MIT license.
http://www.ece.uvic.ca/~mdadams/jasper/

Original comment by mike...@gmail.com on 13 Sep 2010 at 2:02

GoogleCodeExporter commented 9 years ago

OsiriX also uses OpenJPEG, which is BSD licensed.  Not sure why the need for 
both.
http://www.openjpeg.org/

Original comment by mike...@gmail.com on 13 Sep 2010 at 2:25

GoogleCodeExporter commented 9 years ago

Any updates on this?  I'm currently using pydicom mixed with GDCM, looking to 
help incorporate a purely python way to decompress JPG images within DICOM.

Original comment by neuros...@gmail.com on 6 Mar 2014 at 10:54

GoogleCodeExporter commented 9 years ago

No sorry, no update.  I wonder, though ... IIRC I looked at GDCM once (or was 
it dcmtk?) and i remember seeing that they handled compressed images by calling 
a command line routine to convert to non-compressed in a temporary file, and 
then load that. 

It would be fairly easy to add a "hook" in pydicom to call something like dcmtk 
to convert the file and then load it in. Would that be helpful?

And ... I'd be interested to know which specific compressed files (transfer 
syntax) people are encountering in the real world.

Original comment by darcymason@gmail.com on 7 Mar 2014 at 11:36

GoogleCodeExporter commented 9 years ago

I've used dcmtk in conjunction with pydicom. I frequently run into JPEG-LS and 
JPEG2000 variants. A downside to dmctk is that there is no free module for 
encoding/decoding JPEG2000. That being said, OsiriX 
(https://github.com/pixmeo/osirix) uses dcmtk and they have their own way of 
handling JPEG2000 which you can look at in the source.

Original comment by ezaro...@gmail.com on 7 Mar 2014 at 11:39

GoogleCodeExporter commented 9 years ago

I'll try to have a look at OsiriX and see if there might be a way to build a C 
extension. If so, I could see that as a separate pydicom library (to keep the 
core pydicom pure python).

Just by way of explanation for anyone stumbling on this thread...my take on 
this issue (as in issue 16) is that it seems like a large amount of work to try 
to add decompression into pydicom (or related package), which has always had 
the philosophy of being light and very easy to install.  This is why I've 
avoided it and recommend people pre-process compressed files with dcmtk or 
others.

To me, if the pydicom user has to go through the work of a complex installation 
with multiple dependencies, compiling C extensions, etc., then they might as 
well do so with one of the existing packages that already have that all worked 
out.

If there were a way to provide decompression in pure python or with minimal C, 
then I think that would be more in the spirit of pydicom. My hope was to 
perhaps handle a few of the most common compression formats that way.

I welcome anyone's thoughts on this subject.

Having said all that, on a more positive note: the various comments in this 
issue do seem to be converging on OsiriX as a good source to model, if anyone 
had the ambition to try to take this on.  Realistically, it's not something I 
can see getting around to anytime soon.

Original comment by darcymason@gmail.com on 8 Mar 2014 at 12:32

GoogleCodeExporter commented 9 years ago

Darcy, I'm using pydicom and GDCM for my prototype webservice dcmdb 
(http://dcmdb.org), so you can see I'm attempting to capture as many transfer 
syntaxes as possible (http://dcmdb.org/main/transfer_syntax).  Using GDCM I 
have not have a transfer syntax issue as of yet.

I really want a pure python way to get images out of DICOM files, I know Ruby 
DICOM (http://dicom.rubyforge.org/) was able to accomplish that goal, so I'm 
going to start really investigating solutions to this problem ... if you have 
any documentation that would be helpful on this journey, please let me know.

Original comment by neuros...@gmail.com on 9 Mar 2014 at 2:48

GoogleCodeExporter commented 9 years ago

Eric,  

I'm a bit confused.  you're already using GDCM.  are you using its Python 
extension?  if so, what is the reason for a "pure Python" implementation?

in any case, OpenJPEG is really the best library out there, so you would need 
to do one of two things
- link to it, and then wrap an extension around it.  which is what GDCM does.  
this is not pure Python, but is still usable from Python.
- compile it for Python, similar to how kripken has compiled it to JS 
(https://github.com/kripken/j2k.js).  this would be pure Python

HTH

Original comment by mj...@nephosity.com on 9 Mar 2014 at 6:50

GoogleCodeExporter commented 9 years ago

First, thank you to everyone for the discussion.  It's good to get these ideas 
out there and see what we can come up with.

@neurosnap: I found this comment on ruby dicom: 
https://groups.google.com/forum/#!topic/comp.protocols.dicom/NMdlfkzpaII
in which the author says it relies on ImageMagick but it only handles a subset 
of the JPEG variants. That comment is almost a year old though, so perhaps it 
has changed?

@mjpan: do you know any more about how the conversion to JS?  It mentions using 
Emscripten, which I see is an LLVM to JS compiler. Is there a similar package 
to convert for python? And is there any hope of any of this working at an 
acceptable speed?

To all: I've toyed with some ideas on pure python implementations, but if you 
start reading through libjpeg (libijg)code (which dcmtk's jpeg is based on) it 
is incredibly intertwined.  It is so heavily optimized for speed (and memory 
use) that it is very difficult to follow or translate to another language. It 
has been around a long time, so it comes from an era where speed and memory 
were very important. Plus (IIRC) there were many workarounds for known poor 
implementations and such. It would be very hard to replicate this in other 
code; I think that is why no one (very few?) have succeeded in doing so.  
Everywhere I looked, I just saw the same libijg code copied and pasted and 
minimally modified.  In terms of difficulty, Jpeg2000 may be a different 
situation; I haven't really looked into those codes.

Original comment by darcymason@gmail.com on 9 Mar 2014 at 9:57

GoogleCodeExporter commented 9 years ago

Thanks for the response, ideally I would like a better mechanism to deploy my 
server, something that would be easier to install besides GDCM, but I may be 
out of luck.  I'll investigate openJPEG.

Original comment by neuros...@gmail.com on 10 Mar 2014 at 1:06

GoogleCodeExporter commented 9 years ago

Darcy,

I'm sure that compiling OpenJPEG to pure Python can be done via LLVM, but I 
don't think anyone has had the motivation to do so, as it's possible to run 
C/C++ code in Python via an extension, as opposed to Javascript, which requires 
JS code to run in the browser.

for anyone looking for Python -> OpenJPEG, I'd suggest checking out Glymur.  
https://glymur.readthedocs.org/en/latest/introduction.html
although I can't imagine it being any easier than building the GDCM Python 
extension-- CMake really makes things easy, especially as you go across 
operation systems

Original comment by mj...@nephosity.com on 10 Mar 2014 at 8:48

GoogleCodeExporter commented 9 years ago

I just had a go at using Glymur to try to get pixel data out of a 
1.2.840.10008.1.2.4.91: JPEG 2000 Image Compression'.  I suspect that I am 
going about the problems that I have encountered incorrectly.  However I am 
posting this comment here in case anyone can point out my mistake, or it is of 
help to anyone else investigating Glymur.

From the initial error messages it seemed that there was no header data.  This 
seems consistent with, Part 5 of the DICOM standard section A.4.4 (p77) 
http://medical.nema.org/Dicom/2011/11_05pu.pdf staes:

  "The optional JP2 file format header shall NOT be included. "

So far I have found Annex I of ISO15444-1 
http://www.jpeg.org/public/15444-1annexi.pdf.  Hoping DICOM pixel data 
contained a Contiguous Codestream, I attempted to create the relevant headings 
such that it could be simply read by the Glymur command for opening JPEG2000 
files.

import dicom
d = 
dicom.read_file("/usr/lib/python2.7/dist-packages/dicom/testfiles/JPEG2000.dcm")
pd = d.PixelData
f = open("jp2000.jp2", "w+b")

f.write(struct.pack(">I4s4s", 12, "jP  ", "\r\n\x87\n"))
f.write(struct.pack(">I4s4sI4s", 20, "ftyp", "jp2 ", 0, "jp2 "))
f.write(struct.pack(">I4s", 45, "jp2h"))
f.write(struct.pack(">I4sIIHBbBB", 22, "ihdr", d.Rows, d.Columns, 
d.SamplesPerPixel, d.BitsAllocated - 1, 7, 1, 0))#Need to add 128 
d.BitsAllocated - 1 if pixel data is signed
f.write(struct.pack(">I4sBBBI", 15, "colr", 1, 0, 0, 16))
f.write(struct.pack(">I4s", len(pd), "jp2c"))
f.write(pd)
f.flush()
f.close()

>>> jp2 = glymur.Jp2k("jp2000.jp2")
/home/martin/.virtualenvs/anat/local/lib/python2.7/site-packages/glymur/jp2box.p
y:135: UserWarning: Unrecognized box (����) encountered.
  warnings.warn(msg)
/home/martin/.virtualenvs/anat/local/lib/python2.7/site-packages/glymur/jp2box.p
y:147: UserWarning: ���� box has incorrect box length (2822332288)
  warnings.warn(msg)
>>> jp2.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/martin/.virtualenvs/anat/local/lib/python2.7/site-packages/glymur/jp2k.py", line 672, in read
    img = self._read_openjpeg(**kwargs)
  File "/home/martin/.virtualenvs/anat/local/lib/python2.7/site-packages/glymur/jp2k.py", line 713, in _read_openjpeg
    self._subsampling_sanity_check()
  File "/home/martin/.virtualenvs/anat/local/lib/python2.7/site-packages/glymur/jp2k.py", line 684, in _subsampling_sanity_check
    dxs = np.array(codestream.segment[1].xrsiz)
IndexError: list index out of range

Original comment by martin.s...@gmail.com on 18 Mar 2014 at 12:40

GoogleCodeExporter commented 9 years ago

If the image is showing a TransferSyntaxUID=1.2.840.10008.1.2.4.50 
(http://www.dicomlibrary.com/dicom/transfer-syntax/) is there any way I can get 
access to the raw data and just hand it to PIL to decode?

A naive call to: dicom.contrib.pydicom_PIL.show_PIL(dataset) did not work.

Original comment by agrothberg on 2 Jul 2014 at 11:31

SeiictyUsui / pydicom

Provide PixelData decompression (JPEG, RLE, MPEG, etc) #16