cgohlke / imagecodecs

Image transformation, compression, and decompression codecs
https://pypi.org/project/imagecodecs
BSD 3-Clause "New" or "Revised" License
111 stars 21 forks source link

ljpeg (lj92.c) ge bug with example file #61

Closed tgeorge11 closed 1 year ago

tgeorge11 commented 1 year ago

I have been working on trying to decode some ljpeg images from here http://www.eng.usf.edu/cvprg/mammography/database.html

I spent some time with gdcm and your library with not much luck. I have run their binary in linux and it decodes their files ok, it seems its an old a rare bug, but older medical scans seem to have it. I feel that my skill set is not up to patching and compiling. Unfortunately I am more comfortable in windows with precompiled

That being said, I think the key may be ~ line 555 of lj92.c or ~ line 560? of jpegsof3.cpp decoding with jpegsof3_decode or ljpeg_decode both give same bad graduated pattern

Sample file with the pvrg patch they found stanford.pvrg.jpeg.ge_solaris_combo.patch.zip test1-ljpeg.zip

cgohlke commented 1 year ago

All libraries I have tried decode the test1.ljpeg file with the same result. If gdcm is able to detect and work around the buggy ljpeg files, try the gdcm Python bindings or Windows binaries:

https://pypi.org/project/gdcm/#files https://github.com/malaterre/GDCM/releases

cgohlke commented 1 year ago

I might have misunderstood: by "their binary", did you mean PVRG plus patches or GDCM?

I was able to build the PVRG jpeg binary in Windows WSL and successfully convert the test1 file. It also compiles to native Windows executable with the Mingw/UCRT compiler, but the executable does not work correctly.

cgohlke commented 1 year ago

See also #27.

According to https://deckard.duhs.duke.edu/ddsm_sql/c17.html, these LJPEG files can only be decoded by the software that generated them:

LJPEG Files

The .LJPEG files contain the image data. The files are compressed using the Lossless JPEG compression software developed at Stanford University. This software is notoriously and tragically flawed. Once an image is compressed using it, it may only be uncompressed using the same software. Other Lossless JPEG software is unable to undo the confusion caused by the Stanford code, but at least the Stanford code can undo itself. To further compound this, when each image was compressed, the row and column input parameters to the compressor were systematically reversed. This means that once you decompress a DDSM image with the Stanford software, it will report the rows and columns of the image reversed. However, the image data contained in the ics files is correct.

There's a Linux port of the original DDSM USF lossless JPEG compression program, JPEGv1.2.1_LINUXport.tgz, still available on the WaybackMachine. It compiles and works without changes on WSL.

Or use the pvrg-jpeg program from Debian: pvrg-jpeg -d -g test1

tgeorge11 commented 1 year ago

Thank you for all your help. I was working in this direction but my windows machine will not install wsl (some other cryptic error) I had been working for the last 2 weeks on this with little progress. Your clues helped me yesterday. I had some success in msys2 I lost another day trying to understand performance and why i could not compile in clang.

I am over the hump, and have been making good progress today.

Thanks again for your help and clues (the compiling on windows) I got a binary working in windows, and python is happy (as I)

I am working on a cancer project, and it seems this database has the best 'marked' cancers (outlines)

Regards Trent

On Mon, Jan 16, 2023 at 4:02 PM Christoph Gohlke @.***> wrote:

See also #27 https://github.com/cgohlke/imagecodecs/issues/27.

According to https://deckard.duhs.duke.edu/ddsm_sql/c17.html, these LJPEG files can only be decoded by the software that generated them:

LJPEG Files

The .LJPEG files contain the image data. The files are compressed using the Lossless JPEG compression software developed at Stanford University. This software is notoriously and tragically flawed. Once an image is compressed using it, it may only be uncompressed using the same software. Other Lossless JPEG software is unable to undo the confusion caused by the Stanford code, but at least the Stanford code can undo itself. To further compound this, when each image was compressed, the row and column input parameters to the compressor were systematically reversed. This means that once you decompress a DDSM image with the Stanford software, it will report the rows and columns of the image reversed. However, the image data contained in the ics files is correct.

There's a Linux port of the original DDSM USF lossless JPEG compression program, JPEGv1.2.1_LINUXport.tgz, still available on the WaybackMachine https://web.archive.org/web/20170604105845/http://www.cs.unibo.it/~roffilli/sw.html#DDSMUSF. It compiles and works without changes on WSL.

— Reply to this email directly, view it on GitHub https://github.com/cgohlke/imagecodecs/issues/61#issuecomment-1384649740, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACURJI2XFHAH2KY5GFVO6KDWSXHPZANCNFSM6AAAAAAT37EMIQ . You are receiving this because you authored the thread.Message ID: @.***>