bmeg / bmeg-etl

ETL configuration for BMEG
1 stars 2 forks source link

Reworking cellline Case / Sample / Aliquot relationships #324

Closed adamstruck closed 5 years ago

adamstruck commented 5 years ago

Still working on the GDSC transform...

adamstruck commented 5 years ago

partial fix for: #314, #315

kellrott commented 5 years ago

Add xlrd to the python requirements

kellrott commented 5 years ago

Lots of null values in the GDSC DrugResponse output:

{"_id": "DrugResponse:gdsc:ACH-002137:Erlotinib", "gid": "DrugResponse:gdsc:ACH-002137:Erlotinib", "label": "DrugResponse", "data": {"source": "gdsc", "submitter_id": "ACH-002137", "submitter_compound_id": "Erlotinib", "ic50": NaN, "act_area": NaN}}
{"_id": "DrugResponse:gdsc:ACH-000474:Erlotinib", "gid": "DrugResponse:gdsc:ACH-000474:Erlotinib", "label": "DrugResponse", "data": {"source": "gdsc", "submitter_id": "ACH-000474", "submitter_compound_id": "Erlotinib", "ic50": NaN, "act_area": NaN}}
{"_id": "DrugResponse:gdsc:ACH-002089:Erlotinib", "gid": "DrugResponse:gdsc:ACH-002089:Erlotinib", "label": "DrugResponse", "data": {"source": "gdsc", "submitter_id": "ACH-002089", "submitter_compound_id": "Erlotinib", "ic50": NaN, "act_area": NaN}}
adamstruck commented 5 years ago

@kellrott those null value are in the source data I am pulling from DepMap.

adamstruck commented 5 years ago

@bwalsh any idea why the phenotype unit test is failing?

bwalsh commented 5 years ago

bio ontology seems to be partially down. image

adamstruck commented 5 years ago

Current report of missing vertices from check-graph:

{"Project":1,"Transcript":129,"Gene":1124,"Aliquot":502,"GO":26,"Protein":15195,"PFAMFamily":3,"PFAMClan":4}

Missing Aliquots are from GDSC:

$ cat check-graph.out | grep Aliquot | jq -r .To | cut -d : -f 3 | sort | uniq -c
    266 ACH-000474
    266 ACH-002137

These cell line ids have been depreciated in DepMap.

adamstruck commented 5 years ago

Most recent commit should fix the missing aliquots for GDSC and adds an 'auc' property to the DrugResponse vertex to address #325

adamstruck commented 5 years ago

@kellrott @bwalsh this PR is ready for review.