bmeg / bmeg-etl

ETL configuration for BMEG
1 stars 2 forks source link

CCLE Transform Process #4

Closed kellrott closed 6 years ago

kellrott commented 7 years ago
adamstruck commented 6 years ago

Missing drug response data in latest build

adamstruck commented 6 years ago

We are producing DrugResponse vertices and DrugResponseIn edges: https://github.com/bmeg/bmeg-etl/blob/master/transform/gdsc/response.py

@kellrott can we close this?

adamstruck commented 6 years ago

GDSC has overlap with CCLE, but each project has its own drug response data.

bwalsh commented 6 years ago

^^ where is ccle specific drug response data? (https://portals.broadinstitute.org/ccle/data)

adamstruck commented 6 years ago

https://data.broadinstitute.org/ccle_legacy_data/pharmacological_profiling/CCLE_NP24.2009_profiling_2012.02.20.csv

bwalsh commented 6 years ago

@kellrott @adamstruck

The current gdsc.DrugResponse maintains a single value as a description of response between an Aliquot and a Drug

e.g.

bmeg_test=# select * from vertex where label = 'DrugResponse' limit 1 ;
                  gid                   |    label     |                                                 data
----------------------------------------+--------------+------------------------------------------------------------------------------------------------------
 DrugResponse:5-Fluorouracil:ACH-000055 | DrugResponse | {"value": 0.91618468, "metric": "AUC", "sample_id": "ACH-000055", "compound_name": "5-Fluorouracil"}
(1 row)

bmeg_test=# select * from edge where label = 'ResponseTo' and "from" = 'DrugResponse:5-Fluorouracil:ACH-000055' ;
                                   gid                                    |   label    |                  from                  |        to        | data
--------------------------------------------------------------------------+------------+----------------------------------------+------------------+------
 (DrugResponse:5-Fluorouracil:ACH-000055)--ResponseTo->(Compound:CID3385) | ResponseTo | DrugResponse:5-Fluorouracil:ACH-000055 | Compound:CID3385 | {}
(1 row)

bmeg_test=# select * from edge where label = 'DrugResponseIn' and "from" = 'DrugResponse:5-Fluorouracil:ACH-000055' ;
                                      gid                                       |     label      |                  from                  |         to         | data
--------------------------------------------------------------------------------+----------------+----------------------------------------+--------------------+------
 (DrugResponse:5-Fluorouracil:ACH-000055)--DrugResponseIn->(Aliquot:ACH-000055) | DrugResponseIn | DrugResponse:5-Fluorouracil:ACH-000055 | Aliquot:ACH-000055 | {}
(1 row)

the pharmacalogical profiling data above maintains several values with different metrics plus dose profile as a description of response between an Aliquot and a Drug.

namespace(a_max=-22.0538826, act_area=0.03998, activity_data_median=[-23.0, 9.04, 8.59, 26.0, 17.9, 12.1, -11.0, -22.0], activity_sd=[15.7, 1.31, 1.58, 22.9, 26.8, 23.5, 32.8, 19.9], ccle_cell_line_name='ZR7530_BREAST', compound='Erlotinib', doses_um=[0.0025, 0.008, 0.025, 0.08, 0.25, 0.8, 2.53, 8.0], ec50_um=None, fit_type='Linear', ic50_um=8.0, num_data=8.0, primary_cell_line_name='ZR-75-30', target='EGFR')

How would we like to model this?

Choices: