NA-MIC / ProjectWeek

Website for NA-MIC Project Weeks
https://projectweek.na-mic.org
84 stars 284 forks source link

Proposal: WSI-DICOM Improvement - From Viewer to Analysis #846

Closed FabianHoerst closed 10 months ago

FabianHoerst commented 10 months ago

Project Description

Problem Despite various existing solutions for the conversion of WSI data into DICOM, there is a distinct lack of conversion tools (vendor agnostic) that result in DICOM files. Current solutions fall short in generating DICOM files compatible with OpenSlide (4.0.0) and OHIF/SLIM-Viewer, including a PACS, impeding seamless integration and compromising overall performance.

Objectives This project aims to develop an open-source, community-maintained software solution addressing the vendor-agnostic conversion of WSI data into DICOM format. The tool must adhere to established software design patterns, ensuring ease of contribution from the community.

Idea Our project aims to develop a vendor-agnostic WSI to DICOM conversion tool based on existing solutions. We plan to evaluate existing solutions comprehensively and build a test suite covering PACS (Orthanc), viewers (OHIF/SLIM), and Python integration (OpenSlide). The resulting DICOM-WSI should integrate with the OHIF viewer, offering a unified platform for pathology and radiology. Additionally, support for the SLIM viewer is necessary, as it does support adding annotations and visualizing analytics results (e.g., heatmaps).

fedorov commented 10 months ago

You may be interested in reviewing the similar project from the last year: https://projectweek.na-mic.org/PW38_2023_GranCanaria/Projects/IDC_DICOM_WSI_workflow/.

I think @dclunie @maxfscher @DanielaSchacherer and me would be all interested to join.

dclunie commented 10 months ago

I am aware of seven current openly available implementations for the conversion of WSI data into DICOM:

  1. bfconvert - Converting a file to different format — Bio-Formats 7.1.0 documentation. Available from: https://bio-formats.readthedocs.io/en/v7.1.0/users/comlinetools/conversion.html
  2. dicom_wsi
    • Gu Q, Prodduturi N, Jiang J, Flotte TJ, Hart SN. Dicom_wsi: A Python Implementation for Converting Whole-Slide Images to Digital Imaging and Communications in Medicine Compliant Files. J Pathol Inform. 2021;12(1):21. doi:10.4103/jpi.jpi_88_20
    • Hart SN. Steven-N-Hart/dicom_wsi. 2022. Available from: https://github.com/Steven-N-Hart/dicom_wsi
  3. GoogleCloudPlatform. WSI to DICOM Converter. Google Cloud Platform; 2022. Available from: https://github.com/GoogleCloudPlatform/wsi-to-dicom-converter
  4. Sectra AB. wsidicomizer. imi-bigpicture; 2021. Available from: https://github.com/imi-bigpicture/wsidicomizer
  5. Jodogne S, Lenaerts É, Marquet L, Erpicum C, Greimers R, Gillet P, et al. Open Implementation of DICOM for Whole-Slide Microscopic Imaging: In: Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Porto, Portugal: SCITEPRESS - Science and Technology Publications; 2017. p. 81–7. Available from: https://orbi.uliege.be/handle/2268/204498 doi:10.5220/0006155100810087
  6. Clunie D. com.pixelmed.convert.TIFFToDicom. Available from: http://www.dclunie.com/pixelmed/software/javadoc/com/pixelmed/convert/TIFFToDicom.html
  7. Pocock J. wsic. 2023. Available from: https://github.com/John-P/wsic

This does not include commercial products.

Also of interest may be:

  1. Gupta Y, Costa C, Pinho E, Bastião Silva L. DICOMization of Proprietary Files Obtained from Confocal, Whole-Slide, and FIB-SEM Microscope Scanners. Sensors. 2022 Mar 17;22(6):2322. doi:10.3390/s22062322
  2. Accomazzi V, Colaco V. Methods and systems for the efficient acquisition, conversion, and display of pathology images. US11538578B1, 2022. Available from: https://patents.google.com/patent/US11538578B1/en
  3. Brundage D, Rosenthal J, Carelli R, Rand S, Umeton R, Loda M, et al. Whole Slide Image to DICOM Conversion as Event-Driven Cloud Infrastructure. arXiv:220313888 [cs]. 2022 Mar 25; Available from: http://arxiv.org/abs/2203.13888
dclunie commented 10 months ago

A key feature in any converter, IMHO, is to be able to losslessly convert (i.e., without decompressing JPEG or JPEG 2000 and recompressing) when possible, e.g., to take SVS tiles and copy them in their compressed form into DICOM frames. Several of the converters listed earlier have that feature.

curtislisle commented 10 months ago

David’s list might have included this work already, but I thought of this effort, which I have tested, and didn’t see the GitHub link in prior emails.  In my test a few months ago, it didn’t encode the headers correctly to work with Slim, but the software effort seems well done.  Maybe it can be built upon for the solution to this project. imi-bigpicturegithub.comOn Dec 20, 2023, at 12:10 PM, David Clunie @.***> wrote: A key feature in any converter, IMHO, is to be able to losslessly convert (i.e., without decompressing JPEG or JPEG 2000 and recompressing) when possible, e.g., to take SVS tiles and copy them in their compressed form into DICOM frames. Several of the converters listed earlier have that feature.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

FabianHoerst commented 10 months ago

Thank you sincerely for engaging in this discussion and for providing the tools!

I already tested some of the tools a while ago. While all of the tools seem to generate somewhat valid DICOM files, I've encountered challenges in integrating them into different frameworks, such like OHIF/SLIM/QuPath/OpenSlide. Moreover, some tools have limitations with specific file formats or are not optimized for handling larger files exceeding 2GB. I haven't had the opportunity to thoroughly evaluate all the tools yet. However, I believe it would be highly beneficial to carefully examine the integration issues, identify any missing tags, and consider strengthening the specified requirements.

What are your thoughts on this?

dclunie commented 10 months ago

You may be interested in my experience in IDC creating as close to standard as possible DICOM WSI from SVS in a lossless manner with as many mandatory and optional data elements and values populated as possible, using out-of-band metadata sources (e.g., for specimen identification and description); see https://github.com/ImagingDataCommons/idc-wsi-conversion (and also the results of applying my dciodvfy validation tool to those images)..

I have used my own PixelMed conversion tool for IDC conversions up until now because I wanted to create dual-personality TIFF files and no other tool used to do this AFAIK, but we had the BioFormats people add this capability more recently.

That tool also handles > 2GB files (e.g., our RMS collection images are over 20GB in some cases). One of those SVS source images would be a good test case for pushing the limits of the other converters (including not only the large size but also the need for lossless conversion of what were raw (never lossy compressed) pixels).

Also, the current Qupath (0.5) does include both OpenSlide and BioFormats DICOM WSI support, since IDC funded both groups to develop those extensions, and if there are any deficiencies in reading DICOM images in either of those libraries, they should be addressed with issues reported and samples demonstrating any problems.

Another consideration is the form of the overall layout of the DICOM WSI in the converted result, and whether or not, e.g., they are TILED_FULL and omit the Per-Frame Functional Groups Sequence, etc., and what viewers support in this regard, versus what various different scanner vendor's DICOM output looks like (e.g., from Leica, 3DHISTECH, Hamamatsu, etc.). See also the test images from the ECDP 2023 Connectathon at ftp://medical.nema.org/MEDICAL/Dicom/DataSets/WG26/WG26Connectathon2023_ECDP/.

fedorov commented 10 months ago

To add to what David said, here's the direct link that selects DICOM slide microscopy images in IDC Portal: https://portal.imaging.datacommons.cancer.gov/explore/filters/?Modality_op=OR&Modality=SM. We currently have over 23 TB of DICOM SM, and all of those were created using the workflow in https://github.com/ImagingDataCommons/idc-wsi-conversion. All of the images are available for download without login or any special permissions. Let me know if you need help.

Also here's the query that selects top 10 DICOM SM series by size.

SELECT
  SeriesInstanceUID,
  ANY_VALUE(collection_id) as collection_id,
  ROUND(SUM(instance_size)/POW(10,9)) AS size_GB,
  any_value(concat('s3://',aws_bucket,'/',crdc_series_uuid)) as aws_url
FROM
  `bigquery-public-data.idc_current.dicom_all`
WHERE
  Modality = "SM"
GROUP BY
  SeriesInstanceUID
ORDER BY
  size_GB DESC
LIMIT
  10

The largest SM series (>100GB) are those from the HTAN-HMS collection, which contain multichannel fluorescence images.

Next query is a slightly modified to consider only the RMS-Mutation-Prediction collection, which consists of uncompressed H&E slides (largest is ~28GB).

SELECT
  SeriesInstanceUID,
  ANY_VALUE(collection_id) as collection_id,
  ROUND(SUM(instance_size)/POW(10,9)) AS size_GB,
  any_value(concat('s3://',aws_bucket,'/',crdc_series_uuid)) as aws_url
FROM
  `bigquery-public-data.idc_current.dicom_all`
WHERE
  Modality = "SM" and collection_id = "rms_mutation_prediction"
GROUP BY
  SeriesInstanceUID
ORDER BY
  size_GB DESC
LIMIT
  10

You can see this tutorial series to get started with using BigQuery to search IDC data like in the above, how to download images and do other common operations: https://github.com/ImagingDataCommons/IDC-Tutorials/tree/master/notebooks/getting_started.

Happy to help if anything is unclear!

pieper commented 10 months ago

Great to see the discussion on this - Fabian, I hope you can join in person or online at Project Week. Be sure to sign up https://projectweek.na-mic.org/PW40_2024_GranCanaria/

FabianHoerst commented 10 months ago

Thanks all for your feedback! I will create a project page in the next few days and take your discussion into account. I am still open to feedback to outline the project ASAP.