ImagingDataCommons / libdicom

C library for reading DICOM files
https://libdicom.readthedocs.io
MIT License
15 stars 7 forks source link

bioformats compatibility #80

Closed jcupitt closed 3 months ago

jcupitt commented 5 months ago

@FabianHoerst reports:

Our recent efforts have been centered around BioFormats. We performed a conversion on one of the CMU files using the following command:

./bfconvert -tilex 256 -tiley 256 -compression JPEG -noflat -precompressed ./CMU-2.svs ./CMU-2.dcm

  However, when attempting to load the resulting file with OpenSlide, we encountered the following error message:

openslide.lowlevel.OpenSlideUnsupportedFormatError: Unsupported or missing image file

  If you are interested in test files to replicate the error and adapt OpenSlide for the resulting BioFormats DICOM files, I'd be happy to provide them. I believe this could be beneficial as BioFormats currently lacks a Python API.

jcupitt commented 5 months ago

Hi again @FabianHoerst,

I tried here with bioformats 7.1.0:

john@banana ~/packages/bioformats/bioformats-7.1.0/bftools/x $ ../bfconvert -tilex 256 -tiley 256 -compression JPEG -noflat -precompressed ~/pics/openslide/CMU-2.svs ./CMU-2.dcm
/home/john/pics/openslide/CMU-2.svs
SVSReader initializing /home/john/pics/openslide/CMU-2.svs
Reading IFDs
Populating metadata
Populating OME metadata
[Aperio SVS] -> ./CMU-2.dcm [DICOM]
More than 4GB of pixel data, compression will need to be used
Tile size = 256 x 256
    Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
    Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
    Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
    Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
Decompressing resolution: series=0, resolution=4
    Series 0: converted 1/1 planes (100%)
Decompressed tile: series=1, resolution=0, x=0, y=0
    Series 1: converted 1/1 planes (100%)
Decompressed tile: series=2, resolution=0, x=0, y=0
    Series 2: converted 1/1 planes (100%)
[done]
19.877s elapsed (24.285715+2730.5715ms per plane, 437ms overhead

To make:

john@banana ~/packages/bioformats/bioformats-7.1.0/bftools/x $ ls
CMU-2_0_0.dcm  CMU-2_0_2.dcm  CMU-2_0_4.dcm  CMU-2_2_0.dcm
CMU-2_0_1.dcm  CMU-2_0_3.dcm  CMU-2_1_0.dcm

Then built libdicom, openslide, libvips and vipsdisp from git master. With:

john@banana ~/packages/bioformats/bioformats-7.1.0/bftools/x $ vipsdisp CMU-2_0_4.dcm

(you can use any of the .dcm files in the directory)

I see:

image

Aside from the hilariously broken colour, it all seems to work. If I view properties, it seems to have the metadata too:

image

I'll investigate the colour error.

FabianHoerst commented 5 months ago

Thanks for investigating! I will test this on my Ubuntu System, I used the same bf version as you on my Mac. Maybe I should carefully rebuild the OpenSlide library, as I used the release 4.0.0 from October 2023, with which I was able to load other dicom files (e.g., leica proprietary) without any problems.

jcupitt commented 5 months ago

It looks like the colour error is an RGB / YCbCr mixup. If you extract the raw JPEG tile from 2_0_4 (the lowest res pyr layer) it displays fine (it uses 4:2:0 YCbCr), but if you get tiles from 2_0_3 (the next larger layer) they are tagged as RGB but actually encoded as 4:4:4 YCbCr. It looks like openslide will need to override the JPEG colourspace with the DICOM one.

There's some kind of rounding issue with 2_0_4 as well, it's not seeing all the tiles. I'll keep poking it.

jcupitt commented 5 months ago

I think I've found the problem -- Aperio save higher resolution pyramid levels as RGB 4:4:4 (I suppose they want to avoid visible chroma subsampling), but they incorrectly tag these tiles as YCbCr, so when you decode, you see crazy colours. The solution is to override the JPEG colourspace with the photometric interpretation from the enclosing file.

openslide do this for SVS images here:

https://github.com/openslide/openslide/blob/main/src/openslide-decode-tiff.c#L245-L246

So the DICOM loader will need to do this too to handle dcms which have been derived directly from SVS.

I'll make a PR for openslide to fix this.

As a workaround, if you ask bfconvert to recode the JPEGs (by removing the -precompressed flag), it all works, since the tiles will then be tagged correctly:

image

There's a separate issue with (I think?) a rounding error in the lowest-res pyr level, I'll make another PR for that.

jcupitt commented 5 months ago

... this is slightly off-topic, but there's a vipsdisp binary here:

https://flathub.org/apps/org.libvips.vipsdisp

Which includes openslide and libdicom support. It should be easy to install on ubuntu (just a click).

jcupitt commented 5 months ago

I made a patch and bfconvert -precompressed now works! Link above.

However, bfconvert without -precompressed now fails with the opposite colour error, since (I think?) bfconvert is not setting photometric interpretation correctly :( We seem to be caught between two bugs.

jcupitt commented 5 months ago

Yes, I think bfconvert without -precompressed is saving tiles as regular YCbCr JPEGs, but setting DICOM PhotometricInterpretation to RGB.

Here's the PI it's setting:

john@orange ~/pics/openslide/dicom/bioformats-dcm/x $ grep -i photomet *.txt
CMU-2_0_0.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_0_1.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_0_2.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_0_3.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_0_4.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_1_0.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_2_0.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB

Here's a sample tile pulled untouched from the file:

john@orange ~/pics/openslide/dicom/bioformats-dcm/x $ dcm-getframe -o x.jpg CMU-2_0_2.dcm 1

x

If you examine that JPEG you'll see it's a 4:2:0 YCbCr JPEG.

jcupitt commented 5 months ago

... though QuPath manages to display both versions correctly, there's some trick I'm missing.

FabianHoerst commented 5 months ago

... this is slightly off-topic, but there's a vipsdisp binary here:

https://flathub.org/apps/org.libvips.vipsdisp

Which includes openslide and libdicom support. It should be easy to install on ubuntu (just a click).

Thanks for the hint! Trying to follow up on your discussion on that. I really appreciate your effort!

jcupitt commented 5 months ago

... the zoom out error was a problem in vipsdisp, it's working now in git master.

I'll look into the colour issue again and see if I can find another way around this.

jcupitt commented 5 months ago

I've made some more progress. Here's what's happening:

bfconvert output with -precompressed

Tracing though libjpeg, it's assigning the incorrect colour space here:

https://github.com/libjpeg-turbo/libjpeg-turbo/blob/main/jdapimin.c#L158-L176

The Aperio file has the component_ids set to 0, 1, 2 when they should be either 1, 2, 3 or 82, 71, 66 (ascii RGB). Since the component ids are unknown, libjpeg and libjpeg-turbo both default to the incorrect value of YCbCr.

I checked get_sof():

https://github.com/libjpeg-turbo/libjpeg-turbo/blob/main/jdmarker.c#L286-L298

And the file really does have incorrect channel ids.

So: the original SVS file has incorrect channel ids leading to a mistaken guess of YCbCr (this is an aperio bug). The DICOM metadata has the correct RGB photometric interpretation. To read this case correctly, we need to override jpeg_color_space with the DICOM PI during load.

bfconvert output with no -precompressed flag

Now bfconvert will regenerate the JPEGs and tag the channels correctly as YCbCr (it normally writes JPEGs in this space).

However, bfconvert is NOT updating the photometricinterpretation in the DICOM metadata to match the PI they are writing in. To read this case correctly, we must NOT override jpeg_color_space.

Solution

openslide needs to check the component ids of tiles it fetches from DICOMs, and only override jpeg_color_space if the component ids are unknown.

How annoying! I'll try to make a PR.

jcupitt commented 5 months ago

I made a PR and all cases seem to work now. Thanks for reporting this, @FabianHoerst

FabianHoerst commented 5 months ago

Hello @jcupitt,

Thanks for your effort!!! Do you think that this could also be an issue with the Aperio files, which they could fix in their scanning software?

jcupitt commented 5 months ago

It's a very long-standing issue in SVS, so they probably can't change it without breaking compatibility. For example, openslide patch around this problem in their SVS reader here:

https://github.com/openslide/openslide/blob/main/src/openslide-decode-tiff.c#L245-L246

I think it's probably something bfconvert could improve -- IMO they shouldn't just pass JPEGs through from SVS, they ought to fix the component ids. It might be worth reporting the issue to them.

FabianHoerst commented 5 months ago

It could be beneficial; I'm considering bringing up this issue with them. Additionally, I've identified some other issues with their conversion process that I plan to address soon (especially when using -precompressed). However, I need to organize public data to help them replicate the issue beforehand.

jcupitt commented 5 months ago

That sounds like a useful thing to do. I'd be happy to assist if you need anything.

jcupitt commented 3 months ago

bioformats are also working on fixing their side, so I think this is resolved.