Closed jcupitt closed 3 months ago
Hi again @FabianHoerst,
I tried here with bioformats 7.1.0:
john@banana ~/packages/bioformats/bioformats-7.1.0/bftools/x $ ../bfconvert -tilex 256 -tiley 256 -compression JPEG -noflat -precompressed ~/pics/openslide/CMU-2.svs ./CMU-2.dcm
/home/john/pics/openslide/CMU-2.svs
SVSReader initializing /home/john/pics/openslide/CMU-2.svs
Reading IFDs
Populating metadata
Populating OME metadata
[Aperio SVS] -> ./CMU-2.dcm [DICOM]
More than 4GB of pixel data, compression will need to be used
Tile size = 256 x 256
Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
Decompressing resolution: series=0, resolution=4
Series 0: converted 1/1 planes (100%)
Decompressed tile: series=1, resolution=0, x=0, y=0
Series 1: converted 1/1 planes (100%)
Decompressed tile: series=2, resolution=0, x=0, y=0
Series 2: converted 1/1 planes (100%)
[done]
19.877s elapsed (24.285715+2730.5715ms per plane, 437ms overhead
To make:
john@banana ~/packages/bioformats/bioformats-7.1.0/bftools/x $ ls
CMU-2_0_0.dcm CMU-2_0_2.dcm CMU-2_0_4.dcm CMU-2_2_0.dcm
CMU-2_0_1.dcm CMU-2_0_3.dcm CMU-2_1_0.dcm
Then built libdicom, openslide, libvips and vipsdisp from git master. With:
john@banana ~/packages/bioformats/bioformats-7.1.0/bftools/x $ vipsdisp CMU-2_0_4.dcm
(you can use any of the .dcm
files in the directory)
I see:
Aside from the hilariously broken colour, it all seems to work. If I view properties, it seems to have the metadata too:
I'll investigate the colour error.
Thanks for investigating! I will test this on my Ubuntu System, I used the same bf version as you on my Mac. Maybe I should carefully rebuild the OpenSlide library, as I used the release 4.0.0 from October 2023, with which I was able to load other dicom files (e.g., leica proprietary) without any problems.
It looks like the colour error is an RGB / YCbCr mixup. If you extract the raw JPEG tile from 2_0_4 (the lowest res pyr layer) it displays fine (it uses 4:2:0 YCbCr), but if you get tiles from 2_0_3 (the next larger layer) they are tagged as RGB but actually encoded as 4:4:4 YCbCr. It looks like openslide will need to override the JPEG colourspace with the DICOM one.
There's some kind of rounding issue with 2_0_4 as well, it's not seeing all the tiles. I'll keep poking it.
I think I've found the problem -- Aperio save higher resolution pyramid levels as RGB 4:4:4 (I suppose they want to avoid visible chroma subsampling), but they incorrectly tag these tiles as YCbCr, so when you decode, you see crazy colours. The solution is to override the JPEG colourspace with the photometric interpretation from the enclosing file.
openslide do this for SVS images here:
https://github.com/openslide/openslide/blob/main/src/openslide-decode-tiff.c#L245-L246
So the DICOM loader will need to do this too to handle dcms which have been derived directly from SVS.
I'll make a PR for openslide to fix this.
As a workaround, if you ask bfconvert
to recode the JPEGs (by removing the -precompressed
flag), it all works, since the tiles will then be tagged correctly:
There's a separate issue with (I think?) a rounding error in the lowest-res pyr level, I'll make another PR for that.
... this is slightly off-topic, but there's a vipsdisp binary here:
https://flathub.org/apps/org.libvips.vipsdisp
Which includes openslide and libdicom support. It should be easy to install on ubuntu (just a click).
I made a patch and bfconvert -precompressed
now works! Link above.
However, bfconvert
without -precompressed
now fails with the opposite colour error, since (I think?) bfconvert
is not setting photometric interpretation correctly :( We seem to be caught between two bugs.
Yes, I think bfconvert
without -precompressed
is saving tiles as regular YCbCr JPEGs, but setting DICOM PhotometricInterpretation to RGB.
Here's the PI it's setting:
john@orange ~/pics/openslide/dicom/bioformats-dcm/x $ grep -i photomet *.txt
CMU-2_0_0.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_0_1.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_0_2.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_0_3.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_0_4.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_1_0.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
CMU-2_2_0.dcm.txt:(0028,0004) PhotometricInterpretation | CS | 4 | 1 | RGB
Here's a sample tile pulled untouched from the file:
john@orange ~/pics/openslide/dicom/bioformats-dcm/x $ dcm-getframe -o x.jpg CMU-2_0_2.dcm 1
If you examine that JPEG you'll see it's a 4:2:0 YCbCr JPEG.
... though QuPath manages to display both versions correctly, there's some trick I'm missing.
... this is slightly off-topic, but there's a vipsdisp binary here:
https://flathub.org/apps/org.libvips.vipsdisp
Which includes openslide and libdicom support. It should be easy to install on ubuntu (just a click).
Thanks for the hint! Trying to follow up on your discussion on that. I really appreciate your effort!
... the zoom out error was a problem in vipsdisp, it's working now in git master.
I'll look into the colour issue again and see if I can find another way around this.
I've made some more progress. Here's what's happening:
bfconvert
output with -precompressed
Tracing though libjpeg, it's assigning the incorrect colour space here:
https://github.com/libjpeg-turbo/libjpeg-turbo/blob/main/jdapimin.c#L158-L176
The Aperio file has the component_id
s set to 0, 1, 2 when they should be either 1, 2, 3 or 82, 71, 66 (ascii RGB). Since the component ids are unknown, libjpeg and libjpeg-turbo both default to the incorrect value of YCbCr.
I checked get_sof()
:
https://github.com/libjpeg-turbo/libjpeg-turbo/blob/main/jdmarker.c#L286-L298
And the file really does have incorrect channel ids.
So: the original SVS file has incorrect channel ids leading to a mistaken guess of YCbCr (this is an aperio bug). The DICOM metadata has the correct RGB photometric interpretation. To read this case correctly, we need to override jpeg_color_space
with the DICOM PI during load.
bfconvert
output with no -precompressed
flagNow bfconvert
will regenerate the JPEGs and tag the channels correctly as YCbCr (it normally writes JPEGs in this space).
However, bfconvert
is NOT updating the photometricinterpretation in the DICOM metadata to match the PI they are writing in. To read this case correctly, we must NOT override jpeg_color_space
.
openslide needs to check the component ids of tiles it fetches from DICOMs, and only override jpeg_color_space
if the component ids are unknown.
How annoying! I'll try to make a PR.
I made a PR and all cases seem to work now. Thanks for reporting this, @FabianHoerst
Hello @jcupitt,
Thanks for your effort!!! Do you think that this could also be an issue with the Aperio files, which they could fix in their scanning software?
It's a very long-standing issue in SVS, so they probably can't change it without breaking compatibility. For example, openslide patch around this problem in their SVS reader here:
https://github.com/openslide/openslide/blob/main/src/openslide-decode-tiff.c#L245-L246
I think it's probably something bfconvert
could improve -- IMO they shouldn't just pass JPEGs through from SVS, they ought to fix the component ids. It might be worth reporting the issue to them.
It could be beneficial; I'm considering bringing up this issue with them. Additionally, I've identified some other issues with their conversion process that I plan to address soon (especially when using -precompressed). However, I need to organize public data to help them replicate the issue beforehand.
That sounds like a useful thing to do. I'd be happy to assist if you need anything.
bioformats are also working on fixing their side, so I think this is resolved.
@FabianHoerst reports:
Our recent efforts have been centered around BioFormats. We performed a conversion on one of the CMU files using the following command:
However, when attempting to load the resulting file with OpenSlide, we encountered the following error message:
If you are interested in test files to replicate the error and adapt OpenSlide for the resulting BioFormats DICOM files, I'd be happy to provide them. I believe this could be beneficial as BioFormats currently lacks a Python API.