LINCS MCF10A Molecular Deep Dive (MDD)
The data_model
repo submodule
git submodule init
git submodule update
Setup python env
python3 -m venv venv ; source venv/bin/activate
pip intall -r requirements.txt
pip intall -e .
Create your own .env file using https://www.synapse.org/#!PersonalAccessTokens:
cat Secrets/.env
SYNAPSE_AUTH_TOKEN=ey...
Load your environment:
export $(cat Secrets/.env | xargs)
Install GO (at least Go 1.19): https://go.dev/dl/
go install github.com/bmeg/sifter@latest
go install github.com/bmeg/lathe@latest
export PATH=$PATH:$HOME/go/bin
Build Snakefile
lathe plan transform -C .
Run build
snakemake -j 4
Generated data should be under output
and include:
transform.documentReference.DocumentReference.json.gz
transform.patient.Patient.json.gz
transform.researchStudy.ResearchStudy.json.gz
transform.researchSubject.ResearchSubject.json.gz
transform.specimens.Specimen.json.gz
transform.task.Task.json.gz
Refer to DocumentReference.ndjson for the already translated working data.
python -m venv venv source venv/bin/activate pip install -r requirements.txt
Create a new project in Gen3 titled "MCF10A" for this example.
Upload metadata data into peregrine: nice -10 scripts/gen3_emitter.py data load --db_host localhost --sheepdog_creds_path ../compose-services-training/Secrets/sheepdog_creds.json --project_code MCF10A
Load metadata into elastic: nice -10 python3 scripts/load.py load flat --project_id aced-MCF10A --index file --path studies/MCF10A/extractions/DocumentReference.ndjson nice -10 python3 scripts/load.py load flat --project_id aced-MCF10A --index patient --path studies/MCF10A/extractions/Patient.ndjson
The two attached files DocumentReference.ndjson and Patient.ndjson should be placed in studiesMCF10A/extractions The remaining .ndjson files can be transformed into python objects with the transformer.py from the sifter outputs.
Note: There was no transformer written to convert the patient from sifter output into something that could be uploaded into gen3. It is hardcoded into the Patient.ndjson file.
Note: The transform.py modifications to modify document reference to fit gen3 gitops have been lost. Some of the fields in scripts/transform.py must be changed, but only mostly lines 409 - 430 starting at the codeblock "resource.resource_type == 'DocumentReference':" attachment.url does not exist in this MCF10A data. A couple of the other properties might also not be compatible with the MCF10A data structure.