broadinstitute / ml4h

Other
122 stars 23 forks source link

cross-reference: find patients who have 1+ ECG in pre-event window AND 1+ ECG in post-event window #283

Closed erikr closed 4 years ago

erikr commented 4 years ago

What Enhance cross_reference to find patients who have 1+ ECG in a pre-event window, and 1+ ECG in a post-event window, e.g. find patients with "paired" data.

Why We often are only interested in patients who have 1+ ECG prior to some event, as well as 1+ ECG after some event.

Examples:

How

New arguments --reference_start_time_tensor_paired and --reference_end_time_tensor_paired, would enable a user would call cross_reference to find ECGs from patients who have 1+ ECG prior to a surgery, as well as 1+ ECG after the surgery:

./scripts/tf.sh -c -t \
    ${HOME}/ml/ml4cvd/recipes.py \
    --mode cross_reference \
    --tensors_name ecg \
    --tensors /data/partners_ecg/mgh/explore/tensors_all_union.csv \
    --time_tensor partners_ecg_datetime \
    --reference_tensors /data/sts-afib/mgh-afib-after-avr-metadata.csv \
    --reference_name sts-afib-after-avr \
    --reference_join_tensors partners_ecg_patientid_clean \
    --reference_join_tensors mrn \
    --reference_start_time_tensor surgery_date -180 \
    --reference_end_time_tensor surgery_date \
    --reference_start_time_tensor_paired surgery_date \
    --reference_end_time_tensor_paired surgery_date + 180 \
    --output_folder $HOME \
    --id sts-afib-ecg-crossref-180-days-preop

Acceptance Criteria Above command runs cross_reference to find patients who have 1+ ECG in pre-event window and 1+ ECG in post-event window, and quantify ECG coverage.

StevenSong commented 4 years ago

This is really a desire to find cross referenced data in multiple time windows. Instead of only allowing 2, allow any number of time windows by specifying reference_start/end_time_tensor multiple times.

An additional augmentation will be to allow users to specify the number of data needed in each time window and which events in the time series to keep (newest/oldest/random)

arguments will probably look like this:

--mode cross_reference
--output_folder $HOME
--id sts-afib-ecg-crossref-180-days-preop

# Source Tensors
--tensors_name ecg
--tensors /data/partners_ecg/mgh/explore/tensors_all_union.csv
--join_tensor partners_ecg_patientid_clean
--time_tensor partners_ecg_datetime

# Reference Tensors
--reference_tensors /data/sts-afib/mgh-afib-after-avr-metadata.csv
--reference_name sts-afib-after-avr
--reference_join_tensors mrn

# Time Window 1
--reference_start_time_tensor  surgery_date -180
--reference_end_time_tensor    surgery_date
--number_in_window             1
--which_in_window              newest
--window_name                  pre-op

# Time Window 2
--reference_start_time_tensor  surgery_date
--reference_end_time_tensor    surgery_date  180
--number_in_window             1
--which_in_window              oldest
--window_name                  post-op

Output will likely change, details to follow during implementation

erikr commented 4 years ago

Can you clarify what these args do?

--number_in_window             1
--which_in_window              newest

If they serve a key purpose, don't waste time explaining in a comment; better to just explain it in a docstring and point me to that line in the code :)

StevenSong commented 4 years ago

Can you clarify what these args do?

--number_in_window             1
--which_in_window              newest

let's say for a patient 123, you had these data:

ecg 5/12
ecg 5/13
ecg 5/14
surgery 5/15
ecg 5/16
ecg 5/17
ecg 5/18

and you wanted to get the 1 newest pre-op ECG and the 2 oldest post-op ECG, so:

ecg 5/14
surgery 5/15
ecg 5/16
ecg 5/17

you can use args

# pre-op window
--window_name      pre-op
--number_in_window 1 
--which_in_window  newest

# post-op window
--window_name      post-op
--number_in_window 2
--which_in_window  oldest