QIICR / lidc2dicom

Scripts for converting TCIA LIDC-IDRI collection derived data into standard DICOM representation from project-specific XML format.
25 stars 11 forks source link

DICOM SEG outside DICOM CT field of view #9

Open Thibescobar opened 5 months ago

Thibescobar commented 5 months ago

Hello,

By using the TCIA-downloaded data of LIDC with the DICOM SEG for algo development purposes, I discovered that some DICOM SEG are outside the field of view of the DICOM CT volume. Both in Slicer (eg, attached image 1) and LIFEx viewers.

It is for example the case for the sample 291, for which manually adding a minus sign to the x and y in the orientation tuple in ITK in Python ((1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0) to (-1.0, 0.0, 0.0, 0.0, -1.0, 0.0, 0.0, 0.0, 1.0)) fixed the problem (attached image 2).

Where do you think it comes from please? DICOM SEG or viewers? Do you have some workarounds in mind to fix this for all the concerned cases for the whole dataset without breaking the samples that are ok (majority)?

Thank you very much.

image 1 ########## MicrosoftTeams-image (1) ##########

image 2 ########## 2024-06-17 11_19_54-192 168 2 77 - Connexion Bureau à distance ##########

Thibescobar commented 5 months ago

For the moment I use these functions. It seems to work when using SimpleITK even if a bit dirty...

def check_if_seg_outside_vol(seg, img, verbose=False, round=None):

    img_origin = img.GetOrigin()
    seg_origin = seg.GetOrigin()

    img_orientation = img.GetDirection()
    seg_orientation = seg.GetDirection()

    img_size = img.GetSize()
    seg_size = seg.GetSize()

    img_spacing = img.GetSpacing()
    seg_spacing = seg.GetSpacing()

    img_bounds_x = sorted([img_origin[0], img_origin[0]+img_size[0]*img_spacing[0]*img_orientation[0]])
    img_bounds_y = sorted([img_origin[1], img_origin[1]+img_size[1]*img_spacing[1]*img_orientation[4]])
    img_bounds_z = sorted([img_origin[2], img_origin[2]+img_size[2]*img_spacing[2]*img_orientation[8]])
    seg_bounds_x = sorted([seg_origin[0], seg_origin[0]+seg_size[0]*seg_spacing[0]*seg_orientation[0]])
    seg_bounds_y = sorted([seg_origin[1], seg_origin[1]+seg_size[1]*seg_spacing[1]*seg_orientation[4]])
    seg_bounds_z = sorted([seg_origin[2], seg_origin[2]+seg_size[2]*seg_spacing[2]*seg_orientation[8]])
    img_bounds = np.array([img_bounds_x[0], img_bounds_y[0], img_bounds_z[0], img_bounds_x[1], img_bounds_y[1], img_bounds_z[1]])
    seg_bounds = np.array([seg_bounds_x[0], seg_bounds_y[0], seg_bounds_z[0], seg_bounds_x[1], seg_bounds_y[1], seg_bounds_z[1]])

    if round is not None:
        img_bounds = np.round(img_bounds, round)
        seg_bounds = np.round(seg_bounds, round)

    #Check if the segmentation lies within the image bounds
    if verbose:
        print("Image bounds\n", img_bounds.reshape(2, 3))
        print("Segmentation bounds\n", seg_bounds.reshape(2, 3))

    #With proportion of overlap for x and y axis
    proportion_x = (min(seg_bounds[3], img_bounds[3]) - max(seg_bounds[0], img_bounds[0])) / (max(seg_bounds[3], img_bounds[3]) - min(seg_bounds[0], img_bounds[0]))
    proportion_y = (min(seg_bounds[4], img_bounds[4]) - max(seg_bounds[1], img_bounds[1])) / (max(seg_bounds[4], img_bounds[4]) - min(seg_bounds[1], img_bounds[1]))
    #For the z axis as the segments are cropped we just check binary overlap (full or none with no tolerance)
    proportion_z = 1.0
    if seg_bounds[2] > img_bounds[5] or seg_bounds[5] < img_bounds[2]:
        proportion_z = 0.0

    answer = False
    if verbose:
        print("Proportion of overlap in x, y, z: {}, {}, {}".format(proportion_x, proportion_y, proportion_z))
    if proportion_x < 0.9 or proportion_y < 0.9 or proportion_z < 0.9:
        answer = True

    return answer
def change_x_y_sign_ITK_orientation(seg):
    #Change the sign of the x and y components of the orientation matrix
    orientation = np.array(seg.GetDirection())
    orientation[0] = -orientation[0]
    orientation[4] = -orientation[4]
    seg.SetDirection(orientation)
    return seg
fedorov commented 5 months ago

You are right, I can confirm this is a problem.

In the check_if_seg_outside_vol function you provide above, how do you load seg, where is it coming from?

I think first it would be good to understand whether the problem was introduced in the process of conversion into DICOM SEG, or in the process of loading original XML segmentations by pylidc.

Thibescobar commented 5 months ago

Thank you very much for your quick answer.

seg is just opened with SimpleITK as an ITK image object with seg = sitk.ReadImage("seg/path.dcm"), the path pointing to the DICOM SEG downloaded on TCIA.

Simply as well, img is opened with

reader = sitk.ImageSeriesReader()
dcm = sitk.GetGDCMSeriesFileNames("dicom/ct/series/dir/path") #downloaded series from TCIA
reader.SetFileNames(dcm)
img = reader.Execute()

I have not looked at the XML. But trying to understand by looking in the viewers (eg, image 1). It may be due to the fact that some segmentations are RAS instead of LPS but the DICOM fields are as if it was LPS (vice versa)?

Best regards.

fedorov commented 5 months ago

seg is just opened with SimpleITK as an ITK image object with seg = sitk.ReadImage("seg/path.dcm")

This is very interesting. I wouldn't expect SimpleITK to be able to read DICOM SEG (since those are bit-packed), but maybe that feature was added.

The reason I asked is to understand where things went wrong - in XML interpretation or while writing to DICOM SEG. The background for the dataset is that original data was collected and shared in project-specific XML format. pylidc package parsed those XML and DICOM images to make access to those segmentations easier. Then in this paper we used pylidc to encode the project-specific segmentations into standard DICOM representation, which in turn made it possible to visualize segmentations using existing tools (e.g., you can view them in Imaging Data Commons: https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection_id=lidc_idri.

Yes, something went wrong during conversion. Thank you for pointing this out, we should understand the problem and fix it.

Thibescobar commented 5 months ago

Concerned cases are 262, 291, 311, 313, 415, 423, 671, 697, 785, 796, 850, 938.

I just add something else the case 733 have nodules 3 and 4 that are the same.

Best

fedorov commented 5 months ago

Wow, very cool! I was able to independently confirm your list by selecting all segmentations that have inconsistent first component of ImageOrientationPatient compared to the segmented CT - the result I got from the query below was identical to the list you provided!

WITH
  temp_q AS (
  SELECT
    orig.PatientID,
    SAFE_CAST(orig.SharedFunctionalGroupsSequence[SAFE_OFFSET(0)].PlaneOrientationSequence[SAFE_OFFSET(0)].ImageOrientationPatient[SAFE_OFFSET(0)] AS FLOAT64) AS SEG_IOP,
    --array_to_string(orig.SharedFunctionalGroupsSequence[SAFE_OFFSET(0)].PlaneOrientationSequence[SAFE_OFFSET(0)].ImageOrientationPatient,"/") AS SEG_IOP,
    --array_to_string(ct.ImageOrientationPatient,"/") AS CT_IOP
    SAFE_CAST(ct.ImageOrientationPatient[SAFE_OFFSET(0)] AS FLOAT64) AS CT_IOP
  FROM
    `bigquery-public-data.idc_current.dicom_all` AS orig
  JOIN
    `bigquery-public-data.idc_current.dicom_all` AS ct
  ON
    orig.ReferencedSeriesSequence[SAFE_OFFSET(0)].ReferencedInstanceSequence[SAFE_OFFSET(0)].ReferencedSOPInstanceUID = ct.SOPInstanceUID
  WHERE
    orig.Modality = "SEG" AND
    LOWER(orig.Source_DOI) = LOWER("10.7937/TCIA.2018.h7umfurq") )
SELECT
  DISTINCT(PatientID)
FROM
  temp_q
WHERE
  temp_q.SEG_IOP<>temp_q.CT_IOP
ORDER BY
  PatientID asc

image

(if you are interested to learn how to run this query in BigQuery, you can check out IDC getting started tutorials - part 3 is about BigQuery!)

Thibescobar commented 5 months ago

Nice ! Things end well ! If you plan and manage to fix these data on TCIA please let me know so I redownload the last data version.