Closed adamstruck closed 5 years ago
partial fix for: #314, #315
Add xlrd
to the python requirements
Lots of null values in the GDSC DrugResponse output:
{"_id": "DrugResponse:gdsc:ACH-002137:Erlotinib", "gid": "DrugResponse:gdsc:ACH-002137:Erlotinib", "label": "DrugResponse", "data": {"source": "gdsc", "submitter_id": "ACH-002137", "submitter_compound_id": "Erlotinib", "ic50": NaN, "act_area": NaN}}
{"_id": "DrugResponse:gdsc:ACH-000474:Erlotinib", "gid": "DrugResponse:gdsc:ACH-000474:Erlotinib", "label": "DrugResponse", "data": {"source": "gdsc", "submitter_id": "ACH-000474", "submitter_compound_id": "Erlotinib", "ic50": NaN, "act_area": NaN}}
{"_id": "DrugResponse:gdsc:ACH-002089:Erlotinib", "gid": "DrugResponse:gdsc:ACH-002089:Erlotinib", "label": "DrugResponse", "data": {"source": "gdsc", "submitter_id": "ACH-002089", "submitter_compound_id": "Erlotinib", "ic50": NaN, "act_area": NaN}}
@kellrott those null value are in the source data I am pulling from DepMap.
@bwalsh any idea why the phenotype unit test is failing?
bio ontology seems to be partially down.
Current report of missing vertices from check-graph
:
{"Project":1,"Transcript":129,"Gene":1124,"Aliquot":502,"GO":26,"Protein":15195,"PFAMFamily":3,"PFAMClan":4}
Missing Aliquots are from GDSC:
$ cat check-graph.out | grep Aliquot | jq -r .To | cut -d : -f 3 | sort | uniq -c
266 ACH-000474
266 ACH-002137
These cell line ids have been depreciated in DepMap.
Most recent commit should fix the missing aliquots for GDSC and adds an 'auc' property to the DrugResponse vertex to address #325
@kellrott @bwalsh this PR is ready for review.
Still working on the GDSC transform...