ImagingDataCommons / IDC-WebApp

Web Application front end for IDC (CORE REPO)
Apache License 2.0
6 stars 2 forks source link

Patient counts for analysis results collections are incorrect #682

Closed fedorov closed 2 years ago

fedorov commented 3 years ago

As reported by @ulrikew, all analysis results collections show 31 as the number of patients, which is incorrect. As an example, LIDC-IDRI analysis collection with DOI 10.7937/TCIA.2018.h7umfurq has 875 patients.

image

s-paquette commented 3 years ago

This is truly odd and I have to admit I'm not even sure how it happened, as these numbers come straight out of BQ. Fortunately, an easy fix, and done.

s-paquette commented 3 years ago

@fedorov @pgundluru This needs to be tested still.

pgundluru commented 3 years ago

Thank you Suzanne, Please see my steps below for testing this case.

Steps to test

  1. Pick a analysis type collection and export to CSV to manually see and note the Patient ID counts
  2. Then run the SQL query in GCP (example below)to see if the same counts are seen for patient ID.

SELECT COUNT(DISTINCT(PatientID)) FROM canceridc-data.idc_current.dicom_all WHERE source_DOI = "10.7937/TCIA.2019.DEG7ZG1U"

LIDC_IDRI - 1010 (Matches) DICOM-SR-Breast-Clinical - 513 (Matches) PROSTATEx-Seg-HiRes - 66 (doesn't correspond to patient ID count in CSV format) PROSTATEx-Seg-Zones - 98 (doesn't correspond to patient ID count in CSV format) QIN-LungCT-Seg - 31 (doesn't correspond to patient ID count in CSV format) RIDER-LungCT-Seg - 31 (doesn't correspond to patient ID count in CSV format)

s-paquette commented 3 years ago

@pgundluru I'm not sure what you mean in #1, as we don't support that functionality yet. Right now, if you click on the Name of an analysis results set in the Collections list, it selects the source collections, which will contains cases and studies which aren't part of the result set. Thus, the numbers will be off if that's the counting method you're using, until we have properly implemented selection via the DOI. The only way to validate the counts in that table is through BigQuery, for analysis results.

s-paquette commented 2 years ago

@pgundluru @fedorov Can this be confirmed and closed? The query at the top is how to confirm the counts expected in this table.

s-paquette commented 2 years ago

@pgundluru @fedorov I'm going to remove this one from any Release Milestones until we can work out how this should be handled.

pgundluru commented 2 years ago

Thank you, I will reach out to Andrey to validate this ticket as I am not seeing consistent results to mark this issue as passed.

s-paquette commented 2 years ago

Now that we have analysis result ID filtering, can we test this one formally? Clicking on the analysis result name in the collections table now actually selects those specific items--though, note, we select and count at the study level, which may produce slightly discordant numbers.

pgundluru commented 2 years ago

Test for checking Analysis results collections on a study level on test tier. Query with results for each analysis collection below

Glioma SEG - 167 Cases, 168 Studies, and 3535 Series in this cohort image

DICOM-LIDC-IDRI-Nodules - 875 Cases, 883 Studies, and 14691 Series in this cohort image

DICOM-SR-Breast-Clinical - 474 Cases, 1286 Studies, and 15046 Series in this cohort image

PROSTATEx-Seg-HiRes - 66 Cases, 66 Studies, and 3617 Series in this cohort image

PROSTATEx-Seg-Zones - 98 Cases, 98 Studies, and 5308 Series in this cohort image

QIN-LungCT-Seg - 31 Cases, 31 Studies, and 533 Series in this cohort image

RIDER-LungCT-Seg - 31 Cases, 43 Studies, and 268 Series in this cohort image