Closed tschaffter closed 5 years ago
@trberg @yy6linda Do we know when the above required elements will be ready for us to test the IT infrastructure?
@tschaffter So @yy6linda has submitted quite a few models into the evaluation pipeline that could be used. The "gold standard" data, I thought, was in the synthetic dataset. I'm confused about what this means. Do you need an "answer file"? Like the patient list with 0 and 1 for mortality status? By scoring script, do you just mean taking the predictions and comparing them to the gold standard answers?
Do I need to resubmit docker images for IT infrastructure test?
@tschaffter I've uploaded a newer version of the synpuf dataset. I've split this data into a training and validation set and created the "gold standard" file with patient ids and mortality status within 6 months after the end date of the validation set.
@tschaffter The scoring script is just going to be a comparison between the output of the model from @yy6linda, which is being put into /data/predictions/ in the docker and the /evaluate/evaluation_patient_status.csv file. Do you need that written?
@trberg Where is the gold standard for the new synpuf validation set? Will this file have two columns, 1) person_id
and 2) a column with 0 and 1 whether the person is dead after 6 month?
Here is the structure of the new Synpuf data that I see:
Thomass-MacBook-Pro:data tschaffter$ unzip -e synpuf_train_validate_evaluate.zip
Archive: synpuf_train_validate_evaluate.zip
creating: synpuf_clean/
inflating: synpuf_clean/.DS_Store
creating: __MACOSX/
creating: __MACOSX/synpuf_clean/
inflating: __MACOSX/synpuf_clean/._.DS_Store
creating: synpuf_clean/evaluate/
inflating: synpuf_clean/evaluate/evaluation_patient_status.csv
creating: synpuf_clean/train/
inflating: synpuf_clean/train/observation_period.csv
inflating: synpuf_clean/train/drug_exposure.csv
inflating: synpuf_clean/train/death.csv
inflating: synpuf_clean/train/measurement.csv
inflating: synpuf_clean/train/condition_occurrence.csv
inflating: synpuf_clean/train/visit_occurrence.csv
inflating: synpuf_clean/train/person.csv
inflating: synpuf_clean/train/observation.csv
inflating: synpuf_clean/train/procedure_occurrence.csv
inflating: synpuf_clean/visit_occurrence.csv
creating: synpuf_clean/validation/
inflating: synpuf_clean/validation/observation_period.csv
inflating: synpuf_clean/validation/drug_exposure.csv
inflating: synpuf_clean/validation/death.csv
inflating: synpuf_clean/validation/measurement.csv
inflating: synpuf_clean/validation/condition_occurrence.csv
inflating: synpuf_clean/validation/visit_occurrence.csv
inflating: synpuf_clean/validation/person.csv
inflating: synpuf_clean/validation/observation.csv
inflating: synpuf_clean/validation/procedure_occurrence.csv
@tschaffter The scoring script is just going to be a comparison between the output of the model from @yy6linda, which is being put into /data/predictions/ in the docker and the /evaluate/evaluation_patient_status.csv file. Do you need that written?
Yes. Please also describe how to run this script and what the expected output is.
@yy6linda Given you give Tom and me:
train
and outputs a trained modelvalidation
and output a prediction fileIf you already have such an image, please provide the docker command required to run the containers.
@tschaffter the gold standard is synpuf_clean/evaluate/evaluation_patient_status.csv
@yy6linda can you create a simple script to generate an auc from that file and the file your model outputs? I'm not in a position to do that at the moment
@trberg No worries! I will take care of it.
@tschaffter I just uploaded a new docker image to the EHR staging platform. The name of the image is keras_0325:v0.1 This image contains two python scripts: train.py: extract features from the omop train folder and train a neural network model basing on the selected features.
infer.py: apply models to validation set and output 3-month mortality risk for patients in the validation set.
To run the image in the container, firstly mount four folders to the container: omop, prediction, model ,data and then run train.sh and infer.sh
use the docker command below:
Step1. docker run --mount type=bind,source="$(pwd)"/omop,target=/app/omop --mount type=bind,source="$(pwd)"/data,target=/app/data --mount type=bind,source="$(pwd)"/prediction,target=/app/prediction --mount type=bind,source="$(pwd)"/model,target=/app/model keras_0325:v0.1 bash "/app/train.sh"
Step2. docker run --mount type=bind,source="$(pwd)"/omop,target=/app/omop --mount type=bind,source="$(pwd)"/data,target=/app/data --mount type=bind,source="$(pwd)"/prediction,target=/app/prediction --mount type=bind,source="$(pwd)"/model,target=/app/model keras_0325:v0.1 bash "/app/infer.sh"
The model(.h5 file)can be found in the model folder and the output csv file is in the prediction folder.
Please let me know if you have any questions.
@tschaffter Please use the updated command Step1. docker run --mount type=bind,source="$(pwd)"/omop,target=/app/omop --mount type=bind,source="$(pwd)"/data,target=/app/data --mount type=bind,source="$(pwd)"/prediction,target=/app/prediction --mount type=bind,source="$(pwd)"/model,target=/app/model docker.synapse.org/syn18405992/keras_0325:v0.1 bash "/app/train.sh"
Step2. docker run --mount type=bind,source="$(pwd)"/omop,target=/app/omop --mount type=bind,source="$(pwd)"/data,target=/app/data --mount type=bind,source="$(pwd)"/prediction,target=/app/prediction --mount type=bind,source="$(pwd)"/model,target=/app/model docker.synapse.org/syn18405992/keras_0325:v0.1 bash "/app/infer.sh"
Note: Waiting for review from Yao
Download synpuf_train_validate_evaluate.zip
and extract
synapse get syn18460049
Run the train image
docker run -v /synpuf_clean/train:/train:ro
-v /scratch:/scratch:rw
-v /model:/model:rw docker.synapse.org/syn18405992/keras_0326:v0.1 bash "/app/train.sh"
Run the inference image
docker run /synpuf_clean/validation:/infer:ro
-v /scratch:/scratch:rw
-v /output:/output:rw
-v /model:/model:ro
docker.synapse.org/syn18405992/keras_0326:v0.1 bash "/app/infer.sh"
Score the predictions
synapse get syn18475613
python ehr_scoring.py --goldstandard <file> --predictions <file>
Upload score to Synapse
Good practice: Processes In Containers Should Not Run As Root
@tschaffter I modified the docker image according to the steps you mentioned above and just uploaded the modified image(docker.synapse.org/syn18405992/keras_0326:v0.1 ) to EHR Challenge - staging.
@thomasyu888 I have compiled in my previous post the different components required for us to start putting in place the challenge workflow hook. Do you have bandwidth to start putting it in place? Thanks!
@tschaffter can we close this issue?
Background: Sage is taking care of developing the IT infrastructure responsible for:
Task: Provide Sage with the following components to enable the development and testing of the IT infrastructure for the EHR Challenge:
According to Tom, we could deploy and test an initial version of the IT infrastructure on Sage AWS instances in 1-2 days once we have received the above components.