Open fedorov opened 4 years ago
@fedorov 3 of these collections were released late last year and you can access them right now using this URL: https://nbia.cancerimagingarchive.net/nbia-search/?ImageModalityCriteria=SEG&ModalityAll=true&MinNumberOfStudiesCriteria=1&CollectionCriteria=NSCLC-Radiomics,NSCLC-RADIOMICS-INTEROBSERVER1,HEAD-NECK-RADIOMICS-HN1
The 4th one is SEG data for RIDER Lung CT collection. I'm working with our curation team to give you early access to that before we make it public.
Segmentations correspond to those done for each CT volume by 5 observers, once completely manually, and second time using in-house auto-segmentation software followed by manual adjustments.
I looked at introbs05
as example, and confirmed that:
dciodvfy
reports no errorsdcmqi
, and the codes selected for describing segments are appropriate. Example:
(0008,2218) SQ (Sequence with undefined length #=1) # u/l, 1 AnatomicRegionSequence
(fffe,e000) na (Item with undefined length #=3) # u/l, 1 Item
(0008,0100) SH [T-28000] # 8, 1 CodeValue
(0008,0102) SH [SRT] # 4, 1 CodingSchemeDesignator
(0008,0104) LO [Lung] # 4, 1 CodeMeaning
(fffe,e00d) na (ItemDelimitationItem) # 0, 0 ItemDelimitationItem
(fffe,e0dd) na (SequenceDelimitationItem) # 0, 0 SequenceDelimitationItem
(0062,0003) SQ (Sequence with undefined length #=1) # u/l, 1 SegmentedPropertyCategoryCodeSequence
(fffe,e000) na (Item with undefined length #=3) # u/l, 1 Item
(0008,0100) SH [M-01000] # 8, 1 CodeValue
(0008,0102) SH [SRT] # 4, 1 CodingSchemeDesignator
(0008,0104) LO [Morphologically Altered Structure] # 34, 1 CodeMeaning
(fffe,e00d) na (ItemDelimitationItem) # 0, 0 ItemDelimitationItem
(fffe,e0dd) na (SequenceDelimitationItem) # 0, 0 SequenceDelimitationItem
(0062,0004) US 10 # 2, 1 SegmentNumber
(0062,0005) LO [Neoplasm, Primary] # 18, 1 SegmentLabel
(0062,0006) ST [GTV-1vis-5] # 10, 1 SegmentDescription
(0062,0008) CS [MANUAL] # 6, 1 SegmentAlgorithmType
(0062,000d) US 14684\45603\25027 # 6, 3 RecommendedDisplayCIELabValue
(0062,000f) SQ (Sequence with undefined length #=1) # u/l, 1 SegmentedPropertyTypeCodeSequence
(fffe,e000) na (Item with undefined length #=3) # u/l, 1 Item
(0008,0100) SH [M-80003] # 8, 1 CodeValue
(0008,0102) SH [SRT] # 4, 1 CodingSchemeDesignator
(0008,0104) LO [Neoplasm, Primary] # 18, 1 CodeMeaning
(fffe,e00d) na (ItemDelimitationItem) # 0, 0 ItemDelimitationItem
SegmentAlgorithmType
is assigned to SEMIAUTOMATIC
for the segmentations containing "auto" in the SegmentDescription
(which is consistent with the conventions for naming of the structures in RTSTRUCT), and "MANUAL" for those that have "vis".Potential concerns/observations:
GTV-1auto-1
in SEG appears to overlap with the contour GTV-1vis-5
in RTSTRUCT - if I understand the naming conventions correctly, this assigns both a different observer and an inconsistent segmentation approach between the two representations (see corresponding structure names in the list screenshots above, the structures shown have the open eye icon next to the name). In the below, purple color corresponds to the RTSTRUCT representation, and yellow to the SEG representation.
We will be taking a rather careful look at those collections, since we are using them (starting from NSCLC-Radiomics) for the development of the radiomics use case(s) as part of IDC. We will add observations here as we encounter them.
An issue of inconsistent assignment of RT/SEG series to the CT study has been identified and reported to TCIA. Summary below, TCIA ticket is TH-45975.
there is an issue with at least some of the UIDs for the NSCLC-Radiomics collection.
Subject LUNG1-001 has 3 series: CT, RTSTRUCT and SEG.
For whatever reason, CT is assigned to a different study than RTSTRUCT and SEG, but the suffix of the StudyInstanceUID is the same:
CT is in study 1.3.6.1.4.1.32722.99.99.239341353911714368772597187099978969331
RTSTRUCT and SEG are in 1.3.6.1.4.1.40744.29.239341353911714368772597187099978969331.
This is not just an inconvenience and bad practice, but also it appears that the StudyInstanceUID is inconsistent from how series were organized for this collection in the past.
I checked the previous version of the collection, which is still available from the Google copy, and in that collection RTSTRUCT is in the same study as the CT image.
Can you explain what is happening? Is it intentional that StudyInstanceUID was changed, and RTSTRUCT and SEG are assigned to a study different from the CT series?
For your convenience, here's a zip file for LUNG1-001, both the current and previous version: https://app.box.com/s/3v73ot5pq7kza140yoi10qhqia4j7pqt
Issue concerning inconsistent assignment of SeriesInstanceUID here: https://help.cancerimagingarchive.net/servicedesk/customer/portal/1/TH-45975
there is an artifact/inconsistency in contour in couple of slices between SEG and RTSTRUCT
according to Len, this is due to inconsistent order of RTSTRUCT points.
missing structures
Only GTV structures were included intentionally.
Agreed that best way to handle this is to add details to the collection wiki pages (conversion done with Plastimatch, only GTV were included).
inconsistent
StudyInstanceUID
Leo is standing by to hear from TCIA about their approach to remedying this. According to Zhenwei, only LUNG1 collection has the issue of inconsistent StudyInstanceUID
.
Follow on by Len confirming the identified inconsistencies for the interobserver dataset (from email here):
RTSTRUCT have a number of anatomic structures (e.g., lung, heart) in addition to GTV, but SEG only has the neoplasm (=GTV). Not clear if this is expected.
WITH
temp AS (
SELECT
PatientID,
STRING_AGG(DISTINCT(structureSetROISequence.ROIName)) AS distinctRTSTRUCTStructures,
COUNT(structureSetROISequence.ROIName) AS distinct_count
FROM
`idc-dev-etl.idc_tcia.idc_tcia`
CROSS JOIN
UNNEST (StructureSetROISequence) AS structureSetROISequence
WHERE
Modality = "RTSTRUCT"
AND PatientID LIKE "LUNG1%"
GROUP BY
PatientID)
SELECT
*
FROM
temp
WHERE
distinctRTSTRUCTStructures LIKE "%GTV-1%"
ORDER BY
distinct_count DESC
MAASTRO group is about to release several new datasets with segmentations created using dcmqi. We want to review those just in case. As the first step, we need TCIA to give IDC team access via Box or something to do a quick check, or private access via NBIA before it's released publicly. Assigning to @kirbyju until that first step is done.