ImagingDataCommons / TCIA-IDC-Coordination

1 stars 1 forks source link

MAASTRO SEG datasets review #3

Open fedorov opened 4 years ago

fedorov commented 4 years ago

MAASTRO group is about to release several new datasets with segmentations created using dcmqi. We want to review those just in case. As the first step, we need TCIA to give IDC team access via Box or something to do a quick check, or private access via NBIA before it's released publicly. Assigning to @kirbyju until that first step is done.

kirbyju commented 4 years ago

@fedorov 3 of these collections were released late last year and you can access them right now using this URL: https://nbia.cancerimagingarchive.net/nbia-search/?ImageModalityCriteria=SEG&ModalityAll=true&MinNumberOfStudiesCriteria=1&CollectionCriteria=NSCLC-Radiomics,NSCLC-RADIOMICS-INTEROBSERVER1,HEAD-NECK-RADIOMICS-HN1

The 4th one is SEG data for RIDER Lung CT collection. I'm working with our curation team to give you early access to that before we make it public.

fedorov commented 4 years ago

NSCLC-RADIOMICS-INTEROBSERVER1

Segmentations correspond to those done for each CT volume by 5 observers, once completely manually, and second time using in-house auto-segmentation software followed by manual adjustments.

I looked at introbs05 as example, and confirmed that:

Potential concerns/observations:

fedorov commented 4 years ago

We will be taking a rather careful look at those collections, since we are using them (starting from NSCLC-Radiomics) for the development of the radiomics use case(s) as part of IDC. We will add observations here as we encounter them.

fedorov commented 4 years ago

An issue of inconsistent assignment of RT/SEG series to the CT study has been identified and reported to TCIA. Summary below, TCIA ticket is TH-45975.


there is an issue with at least some of the UIDs for the NSCLC-Radiomics collection.

Subject LUNG1-001 has 3 series: CT, RTSTRUCT and SEG.

For whatever reason, CT is assigned to a different study than RTSTRUCT and SEG, but the suffix of the StudyInstanceUID is the same:

CT is in study 1.3.6.1.4.1.32722.99.99.239341353911714368772597187099978969331

RTSTRUCT and SEG are in 1.3.6.1.4.1.40744.29.239341353911714368772597187099978969331.

This is not just an inconvenience and bad practice, but also it appears that the StudyInstanceUID is inconsistent from how series were organized for this collection in the past.

I checked the previous version of the collection, which is still available from the Google copy, and in that collection RTSTRUCT is in the same study as the CT image.

Can you explain what is happening? Is it intentional that StudyInstanceUID was changed, and RTSTRUCT and SEG are assigned to a study different from the CT series?

For your convenience, here's a zip file for LUNG1-001, both the current and previous version: https://app.box.com/s/3v73ot5pq7kza140yoi10qhqia4j7pqt

fedorov commented 4 years ago

Issue concerning inconsistent assignment of SeriesInstanceUID here: https://help.cancerimagingarchive.net/servicedesk/customer/portal/1/TH-45975

fedorov commented 4 years ago

there is an artifact/inconsistency in contour in couple of slices between SEG and RTSTRUCT

according to Len, this is due to inconsistent order of RTSTRUCT points.

missing structures

Only GTV structures were included intentionally.

Agreed that best way to handle this is to add details to the collection wiki pages (conversion done with Plastimatch, only GTV were included).

inconsistent StudyInstanceUID

Leo is standing by to hear from TCIA about their approach to remedying this. According to Zhenwei, only LUNG1 collection has the issue of inconsistent StudyInstanceUID.

fedorov commented 4 years ago

Follow on by Len confirming the identified inconsistencies for the interobserver dataset (from email here):

image

image

fedorov commented 4 years ago

NSCLC-Radiomics

RTSTRUCT have a number of anatomic structures (e.g., lung, heart) in addition to GTV, but SEG only has the neoplasm (=GTV). Not clear if this is expected.

WITH
  temp AS (
  SELECT
    PatientID,
    STRING_AGG(DISTINCT(structureSetROISequence.ROIName)) AS distinctRTSTRUCTStructures,
    COUNT(structureSetROISequence.ROIName) AS distinct_count
  FROM
    `idc-dev-etl.idc_tcia.idc_tcia`
  CROSS JOIN
    UNNEST (StructureSetROISequence) AS structureSetROISequence
  WHERE
    Modality = "RTSTRUCT"
    AND PatientID LIKE "LUNG1%"
  GROUP BY
    PatientID)
SELECT
  *
FROM
  temp
WHERE
  distinctRTSTRUCTStructures LIKE "%GTV-1%"
ORDER BY
  distinct_count DESC