ImagingDataCommons / IDC-WebApp

Web Application front end for IDC (CORE REPO)
Apache License 2.0
6 stars 2 forks source link

Support filtering by patient age #950

Open fedorov opened 2 years ago

fedorov commented 2 years ago

Feature suggested during the 2022 AAPM webinar:

image
s-paquette commented 2 years ago

@fedorov This was in reference to DICOM age, IIRC, which means we'd need to add it to the curation table for conversion from a string into an integer.

bcli4d commented 2 years ago

So the following for dicom_metadata_curated?: SELECT SOPInstanceUID, SAFE_CAST(SliceThickness AS FLOAT64) AS SliceThickness, SAFE_CAST(PatientAge AS INT64) AS PatientAge FROM idc-dev-etl.idc_v9_pub.dicom_metadata AS dcm

bcli4d commented 2 years ago

I guess it is not as simple as above. So, that value is has type Age String. There are a few xxxD and xxxM values. To what units should these all be converted? Map xxxD and xxxM to 0 years (assuming xxxD < 365 and xxxM < 12, which they all are)?

bcli4d commented 2 years ago

There are also some values that are just integers without a Y, M, W or D. Assumes theres are years?

fedorov commented 2 years ago

Assumes theres are years?

Assume it's a mess we are dealing with! :-D

Adding to the agenda for tomorrow meeting to do a quick check of the strategy with David C!

fedorov commented 2 years ago

Per discussion with @dclunie we should

SELECT
PatientAge, count(distinct(SeriesInstanceUID)) as num_distinct_series
FROM
idc-dev-etl.idc_v9_pub.dicom_metadata AS dcm
group by PatientAge
order by num_distinct_series desc

image

We should also see how to treat days/months.

We should aggregate PatientAge over the items within the same study, as it is expected to be the same for all items within the study - this should help cleaning up nulls and addressing lack of propagation of it into SRs and other derived items.