amspath / libisyntax

BSD 2-Clause "Simplified" License
11 stars 7 forks source link

Dicom conversion #32

Open jonasteuwen opened 5 months ago

jonasteuwen commented 5 months ago

BigPicture uses DICOM: https://bigpicture.eu/news/bigpicture-raises-dicom-standards and it is likely to become an industry standard.

I can write a converter if you want, but it might be tricky to know which libraries to import.

Shall I draft something?

Falcury commented 5 months ago

Yeah, I agree it would probably be a good idea for something like this to exist.

(Should an iSyntax-to-DICOM converter then be within the scope of libisyntax, or maybe a separate project?)

As for libraries, if I were to go about it I would probably try to use libjpeg-turbo for the JPEG encoding and try to use as few as possible additional libraries, like I did for adding (partial) support for reading DICOM WSIs in Slidescape (see the code in here). But there might be easier/better ways to do it.

jonasteuwen commented 5 months ago

It could also be simpler to have either openslide bindings or python bindings directly. Extending https://github.com/imi-bigpicture/wsidicomizer/tree/main/wsidicomizer would then make it very easy

Writing it without dependencies is likely painful: https://github.com/GoogleCloudPlatform/wsi-to-dicom-converter

Falcury commented 5 months ago

Yes, I think extending an existing WSI to DICOM converter could be a good strategy. Then iSyntax would be just one more backend to support.

Writing it without dependencies is likely painful: https://github.com/GoogleCloudPlatform/wsi-to-dicom-converter

I am fairly confident it would be painful, only thinking about the number of hours I already spent reading the DICOM standard... ;-)

erikogabrielsson commented 3 months ago

If openslide incorporates support for isyntax (through libisyntax) conversion to DICOM should be straightforward with what is already implemented in WsiDicomizer, as it already has openslide support.

jonasteuwen commented 3 months ago

That’s true, but that still seems to require quite a bit of work. It might be much easier at this point to make a python binding. It’s on my todo

erikogabrielsson commented 3 months ago

I'm playing around with python bindings at this moment using cython, and have managed to read out tiles. Not sure how easy it is to package though.

jonasteuwen commented 3 months ago

I suppose you can use cmake or meson to compile the library during install. If we would convert it to dicom it might be good to look into the XML header there are a lot of dcm tags there.

Probably it’s also a good idea to create the levels > 0 ourselves as there is an annoying offset between the levels.

Another option would be to perhaps make some GitHub actions to package it as a .deb, similar to openslide

erikogabrielsson commented 3 months ago

I got a converter running using pyisyntax.

Will explore parsing metadata from the xml header. I have previously parsed some philips xml metadata from isyntax files converted to tiff, and could get some DICOM required attributes out of it

jonasteuwen commented 3 months ago

This is pretty sweet, does your DICOM converter also support its own downsampling starting at level 0? That would solve #36 (offset between levels).

An additional idea would be to have a .get_offset(level) function so we can manually offset annotations as well.

For the DICOM metadata, probably you can just make a few simple adjustments to the libisyntax code to expose those. Something like GetDicomTag

Falcury commented 3 months ago

For the DICOM metadata, probably you can just make a few simple adjustments to the libisyntax code to expose those. Something like GetDicomTag

We could add some accessors to expose the relevant fields. Or maybe allow invoking a callback procedure while parsing the XML header.

erikogabrielsson commented 3 months ago

This is pretty sweet, does your DICOM converter also support its own downsampling starting at level 0? That would solve #36 (offset between levels).

It can re-create a full pyramid if levels are missing.

An additional idea would be to have a .get_offset(level) function so we can manually offset annotations as well.

With annotations, do you mean graphical annotations? In what dimensions are those (pixels or meters)?

For the DICOM metadata, probably you can just make a few simple adjustments to the libisyntax code to expose those. Something like GetDicomTag

For converting to DICOM, there are some tags that are difficult to get and for the user to supply, for example

Is such metadata available in the isyntax XML?

dregula commented 2 months ago

Erik- I never found the offset in the FIC (isyntax xml). Interestingly, the lowest-level WSI image is never superimposed on the macro (jpeg) image in either the local PIFV viewer or in IMS, suggesting any alignment is imprecise, possibly scanner-specific. Both viewers display small entire-slide-thumbnails, apparently simply the macro image concatenated with the label image. I am guessing the line between those two jpeg images is considered the image column margin.
The slidescape team can probably confirm whether the WSI image matrix origin is the same as the "join" between label and macro images. (I have glass slides I can measure to see how those (jpeg) images align with the true label edge.)

Falcury commented 1 month ago

For converting to DICOM, there are some tags that are difficult to get and for the user to supply, for example

Is such metadata available in the isyntax XML?

The image orientation is described in Philips' file format specification document (see the attached document for details, specifically chapter 2 and the appendix). 4522 207 43941_2020_04_24 Pathology iSyntax image format.pdf

The origin of the pyramid is specified under the UFS_IMAGE_GENERAL_HEADERS tag. Here is a dump of the relevant part of the XML (dumped using Slidescape) for testslide.isyntax:

     DICOM: UFS_IMAGE_GENERAL_HEADERS                (0x301d, 0x2000), array
      Array
       DataObject ObjectType = UFSImageGeneralHeader
        DICOM: UFS_IMAGE_NUMBER_OF_BLOCKS               (0x301d, 0x2001), size:6        = 173421
        DICOM: UFS_IMAGE_DIMENSIONS_OVER_BLOCK          (0x301d, 0x2002), size:9        = 1 0 4 2 3
        DICOM: UFS_IMAGE_DIMENSIONS                     (0x301d, 0x2003), array
         Array
          DataObject ObjectType = UFSImageDimension
           DICOM: UFS_IMAGE_DIMENSION_NAME                 (0x301d, 0x2004), size:1        = x
           DICOM: UFS_IMAGE_DIMENSION_TYPE                 (0x301d, 0x2005), size:7        = spatial
           DICOM: UFS_IMAGE_DIMENSION_UNIT                 (0x301d, 0x2006), size:10       = MicroMeter
           DICOM: UFS_IMAGE_DIMENSION_SCALE_FACTOR         (0x301d, 0x2007), size:4        = 0.25
          element end: DataObject
          DataObject ObjectType = UFSImageDimension
           DICOM: UFS_IMAGE_DIMENSION_NAME                 (0x301d, 0x2004), size:1        = y
           DICOM: UFS_IMAGE_DIMENSION_TYPE                 (0x301d, 0x2005), size:7        = spatial
           DICOM: UFS_IMAGE_DIMENSION_UNIT                 (0x301d, 0x2006), size:10       = MicroMeter
           DICOM: UFS_IMAGE_DIMENSION_SCALE_FACTOR         (0x301d, 0x2007), size:4        = 0.25
          element end: DataObject
          DataObject ObjectType = UFSImageDimension
           DICOM: UFS_IMAGE_DIMENSION_NAME                 (0x301d, 0x2004), size:9        = component
           DICOM: UFS_IMAGE_DIMENSION_TYPE                 (0x301d, 0x2005), size:16       = colour component
           DICOM: UFS_IMAGE_DIMENSION_DISCRETE_VALUES_STRING (0x301d, 0x2008), size:13       = "Y" "Co" "Cg"
          element end: DataObject
          DataObject ObjectType = UFSImageDimension
           DICOM: UFS_IMAGE_DIMENSION_NAME                 (0x301d, 0x2004), size:5        = scale
           DICOM: UFS_IMAGE_DIMENSION_TYPE                 (0x301d, 0x2005), size:5        = scale
          element end: DataObject
          DataObject ObjectType = UFSImageDimension
           DICOM: UFS_IMAGE_DIMENSION_NAME                 (0x301d, 0x2004), size:11       = waveletcoef
           DICOM: UFS_IMAGE_DIMENSION_TYPE                 (0x301d, 0x2005), size:11       = waveletcoef
           DICOM: UFS_IMAGE_DIMENSION_DISCRETE_VALUES_STRING (0x301d, 0x2008), size:19       = "LL" "LH" "HL" "HH"
          element end: DataObject
         element end: Array
        element end: Attribute
        DICOM: UFS_IMAGE_DIMENSION_RANGES               (0x301d, 0x200a), array
         Array
          DataObject ObjectType = UFSImageDimensionRange
           DICOM: UFS_IMAGE_DIMENSION_RANGE                (0x301d, 0x200b), size:13       = 13531 1 52442
          element end: DataObject
          DataObject ObjectType = UFSImageDimensionRange
           DICOM: UFS_IMAGE_DIMENSION_RANGE                (0x301d, 0x200b), size:13       = 22053 1 96804
          element end: DataObject
          DataObject ObjectType = UFSImageDimensionRange
           DICOM: UFS_IMAGE_DIMENSION_RANGE                (0x301d, 0x200b), size:5        = 0 1 2
          element end: DataObject
          DataObject ObjectType = UFSImageDimensionRange
           DICOM: UFS_IMAGE_DIMENSION_RANGE                (0x301d, 0x200b), size:5        = 0 1 7
          element end: DataObject
          DataObject ObjectType = UFSImageDimensionRange
           DICOM: UFS_IMAGE_DIMENSION_RANGE                (0x301d, 0x200b), size:5        = 0 1 3
          element end: DataObject
         element end: Array
        element end: Attribute
       element end: DataObject
      element end: Array
     element end: Attribute

In the example of testslide.isyntax, the (padded) level 0 pyramid starts at (13531, 22053) and has a padded width/height of (38912, 74752).

In libisyntax, the relevant part of the XML is parsed here: https://github.com/amspath/libisyntax/blob/b9f0cd980a93d07602caa3536c7fef9245ae2fd9/src/isyntax/isyntax.c#L520-L540

Erik- I never found the offset in the FIC (isyntax xml). Interestingly, the lowest-level WSI image is never superimposed on the macro (jpeg) image in either the local PIFV viewer or in IMS, suggesting any alignment is imprecise, possibly scanner-specific. Both viewers display small entire-slide-thumbnails, apparently simply the macro image concatenated with the label image. I am guessing the line between those two jpeg images is considered the image column margin. The slidescape team can probably confirm whether the WSI image matrix origin is the same as the "join" between label and macro images. (I have glass slides I can measure to see how those (jpeg) images align with the true label edge.)

The macro image has its origin at (0, 0) in the coordinate system used by the iSyntax files. The offset of the WSI pyramid can be read from the XML header as described above. The label image is rotated 90 degrees compared to the macro image so that the text on the label is right-side up (see the specification document for details).

The FIC files produced by the PIFV viewer are not very useful I think, the information in there is incomplete.