Open Thibescobar opened 5 months ago
For the moment I use these functions. It seems to work when using SimpleITK even if a bit dirty...
def check_if_seg_outside_vol(seg, img, verbose=False, round=None):
img_origin = img.GetOrigin()
seg_origin = seg.GetOrigin()
img_orientation = img.GetDirection()
seg_orientation = seg.GetDirection()
img_size = img.GetSize()
seg_size = seg.GetSize()
img_spacing = img.GetSpacing()
seg_spacing = seg.GetSpacing()
img_bounds_x = sorted([img_origin[0], img_origin[0]+img_size[0]*img_spacing[0]*img_orientation[0]])
img_bounds_y = sorted([img_origin[1], img_origin[1]+img_size[1]*img_spacing[1]*img_orientation[4]])
img_bounds_z = sorted([img_origin[2], img_origin[2]+img_size[2]*img_spacing[2]*img_orientation[8]])
seg_bounds_x = sorted([seg_origin[0], seg_origin[0]+seg_size[0]*seg_spacing[0]*seg_orientation[0]])
seg_bounds_y = sorted([seg_origin[1], seg_origin[1]+seg_size[1]*seg_spacing[1]*seg_orientation[4]])
seg_bounds_z = sorted([seg_origin[2], seg_origin[2]+seg_size[2]*seg_spacing[2]*seg_orientation[8]])
img_bounds = np.array([img_bounds_x[0], img_bounds_y[0], img_bounds_z[0], img_bounds_x[1], img_bounds_y[1], img_bounds_z[1]])
seg_bounds = np.array([seg_bounds_x[0], seg_bounds_y[0], seg_bounds_z[0], seg_bounds_x[1], seg_bounds_y[1], seg_bounds_z[1]])
if round is not None:
img_bounds = np.round(img_bounds, round)
seg_bounds = np.round(seg_bounds, round)
#Check if the segmentation lies within the image bounds
if verbose:
print("Image bounds\n", img_bounds.reshape(2, 3))
print("Segmentation bounds\n", seg_bounds.reshape(2, 3))
#With proportion of overlap for x and y axis
proportion_x = (min(seg_bounds[3], img_bounds[3]) - max(seg_bounds[0], img_bounds[0])) / (max(seg_bounds[3], img_bounds[3]) - min(seg_bounds[0], img_bounds[0]))
proportion_y = (min(seg_bounds[4], img_bounds[4]) - max(seg_bounds[1], img_bounds[1])) / (max(seg_bounds[4], img_bounds[4]) - min(seg_bounds[1], img_bounds[1]))
#For the z axis as the segments are cropped we just check binary overlap (full or none with no tolerance)
proportion_z = 1.0
if seg_bounds[2] > img_bounds[5] or seg_bounds[5] < img_bounds[2]:
proportion_z = 0.0
answer = False
if verbose:
print("Proportion of overlap in x, y, z: {}, {}, {}".format(proportion_x, proportion_y, proportion_z))
if proportion_x < 0.9 or proportion_y < 0.9 or proportion_z < 0.9:
answer = True
return answer
def change_x_y_sign_ITK_orientation(seg):
#Change the sign of the x and y components of the orientation matrix
orientation = np.array(seg.GetDirection())
orientation[0] = -orientation[0]
orientation[4] = -orientation[4]
seg.SetDirection(orientation)
return seg
You are right, I can confirm this is a problem.
In the check_if_seg_outside_vol
function you provide above, how do you load seg
, where is it coming from?
I think first it would be good to understand whether the problem was introduced in the process of conversion into DICOM SEG, or in the process of loading original XML segmentations by pylidc
.
Thank you very much for your quick answer.
seg
is just opened with SimpleITK as an ITK image object with
seg = sitk.ReadImage("seg/path.dcm")
,
the path pointing to the DICOM SEG downloaded on TCIA.
Simply as well, img
is opened with
reader = sitk.ImageSeriesReader()
dcm = sitk.GetGDCMSeriesFileNames("dicom/ct/series/dir/path") #downloaded series from TCIA
reader.SetFileNames(dcm)
img = reader.Execute()
I have not looked at the XML. But trying to understand by looking in the viewers (eg, image 1). It may be due to the fact that some segmentations are RAS instead of LPS but the DICOM fields are as if it was LPS (vice versa)?
Best regards.
seg is just opened with SimpleITK as an ITK image object with seg = sitk.ReadImage("seg/path.dcm")
This is very interesting. I wouldn't expect SimpleITK to be able to read DICOM SEG (since those are bit-packed), but maybe that feature was added.
The reason I asked is to understand where things went wrong - in XML interpretation or while writing to DICOM SEG. The background for the dataset is that original data was collected and shared in project-specific XML format. pylidc
package parsed those XML and DICOM images to make access to those segmentations easier. Then in this paper we used pylidc
to encode the project-specific segmentations into standard DICOM representation, which in turn made it possible to visualize segmentations using existing tools (e.g., you can view them in Imaging Data Commons: https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection_id=lidc_idri.
Yes, something went wrong during conversion. Thank you for pointing this out, we should understand the problem and fix it.
Concerned cases are 262, 291, 311, 313, 415, 423, 671, 697, 785, 796, 850, 938.
I just add something else the case 733 have nodules 3 and 4 that are the same.
Best
Wow, very cool! I was able to independently confirm your list by selecting all segmentations that have inconsistent first component of ImageOrientationPatient
compared to the segmented CT - the result I got from the query below was identical to the list you provided!
WITH
temp_q AS (
SELECT
orig.PatientID,
SAFE_CAST(orig.SharedFunctionalGroupsSequence[SAFE_OFFSET(0)].PlaneOrientationSequence[SAFE_OFFSET(0)].ImageOrientationPatient[SAFE_OFFSET(0)] AS FLOAT64) AS SEG_IOP,
--array_to_string(orig.SharedFunctionalGroupsSequence[SAFE_OFFSET(0)].PlaneOrientationSequence[SAFE_OFFSET(0)].ImageOrientationPatient,"/") AS SEG_IOP,
--array_to_string(ct.ImageOrientationPatient,"/") AS CT_IOP
SAFE_CAST(ct.ImageOrientationPatient[SAFE_OFFSET(0)] AS FLOAT64) AS CT_IOP
FROM
`bigquery-public-data.idc_current.dicom_all` AS orig
JOIN
`bigquery-public-data.idc_current.dicom_all` AS ct
ON
orig.ReferencedSeriesSequence[SAFE_OFFSET(0)].ReferencedInstanceSequence[SAFE_OFFSET(0)].ReferencedSOPInstanceUID = ct.SOPInstanceUID
WHERE
orig.Modality = "SEG" AND
LOWER(orig.Source_DOI) = LOWER("10.7937/TCIA.2018.h7umfurq") )
SELECT
DISTINCT(PatientID)
FROM
temp_q
WHERE
temp_q.SEG_IOP<>temp_q.CT_IOP
ORDER BY
PatientID asc
(if you are interested to learn how to run this query in BigQuery, you can check out IDC getting started tutorials - part 3 is about BigQuery!)
Nice ! Things end well ! If you plan and manage to fix these data on TCIA please let me know so I redownload the last data version.
Hello,
By using the TCIA-downloaded data of LIDC with the DICOM SEG for algo development purposes, I discovered that some DICOM SEG are outside the field of view of the DICOM CT volume. Both in Slicer (eg, attached image 1) and LIFEx viewers.
It is for example the case for the sample 291, for which manually adding a minus sign to the x and y in the orientation tuple in ITK in Python ((1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0) to (-1.0, 0.0, 0.0, 0.0, -1.0, 0.0, 0.0, 0.0, 1.0)) fixed the problem (attached image 2).
Where do you think it comes from please? DICOM SEG or viewers? Do you have some workarounds in mind to fix this for all the concerned cases for the whole dataset without breaking the samples that are ok (majority)?
Thank you very much.
image 1 ########## ##########
image 2 ########## ##########