AstraZeneca / chemicalx

A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)
https://chemicalx.readthedocs.io
Apache License 2.0
700 stars 89 forks source link

Create data loader #3

Closed benedekrozemberczki closed 2 years ago

benedekrozemberczki commented 2 years ago
cthoyt commented 2 years ago

I have 5 papers with datasets that would be worth looking into suggested by @debplana:

Mathews Griner LA, Guha R, Shinn P, Young RM, Keller JM, Liu D, Goldlust IS, Yasgar A, McKnight C, Boxer MB, Duveau DY, Jiang JK, Michael S, Mierzwa T, Huang W, Walsh MJ, Mott BT, Patel P, Leister W, Maloney DJ, et al. 2014. High-throughput combinatorial screening identifies drugs that cooperate with ibrutinib to kill activated B-cell-like diffuse large B-cell lymphoma cells. PNAS 111:2349–2354. DOI: https://doi.org/10.1073/pnas.1311846111, PMID: 24469833

This dataset only has a handful of drug synergy pairs. Could be manually curated to be used for evalution, but not enough for training.

O’Neil J, Benita Y, Feldman I, Chenard M, Roberts B, Liu Y, Li J, Kral A, Lejnine S, Loboda A, Arthur W. An unbiased oncology compound screen to identify novel combination strategies. Molecular Cancer Therapeutics. 2016;15(6):1155–1162. https://doi.org/10.1158/1535-7163.MCT-15-0843

This is the OncoPolyPharmacology in TDC.

Borisy AA, Elliott PJ, Hurst NW, Lee MS, Lehar J, Price ER, Serbedzija G, Zimmermann GR, Foley MA, Stockwell BR, Keith CT. 2003. Systematic discovery of multicomponent therapeutics. PNAS 100:7977–7982. DOI: https://doi.org/10.1073/pnas.1337088100, PMID: 12799470

Could not find supplementary information

DREAM Challenge

Bansal M, Yang J, Karan C, Menden MP, Costello JC, Tang H, Xiao G, Li Y, Allen J, Zhong R, Chen B, Kim M, Wang T, Heiser LM, Realubit R, Mattioli M, Alvarez MJ, Shen Y, Gallahan D, Singer D, et al. 2014. A community computational challenge to predict the activity of pairs of compounds. Nature Biotechnology 32:1213–1222. DOI: https://doi.org/10.1038/nbt.3052, PMID: 25419740

It appears the website linked by this paper, http://www.the-dream-project.org/challenges/nci-dream-drug-sensitivity-prediction-challenge, is down.

AstraZeneca-Sanger Drug Combination DREAM Consortium, Menden MP, Wang D, Mason MJ, Szalai B, Bulusu KC, Guan Y, Yu T, Kang J, Jeon M, Wolfinger R, Nguyen T, Zaslavskiy M, Jang IS, Ghazoui Z, Ahsen ME, Vogel R, Neto EC, Norman T, Tang EKY, Garnett MJ, et al. 2019. Community assessment to advance computational prediction of Cancer drug combinations in a pharmacogenomic screen. Nature Communications 10:2674. DOI: https://doi.org/10.1038/s41467-019-09799-2, PMID: 3120923

Therapeutic Data Commons


After chatting with Deb, it's clear we should be really careful to make sure we only compare within cell lines. This also opens us up to doing an interesting evaluation where you train on data from one cell line and test on another. Second important thing is we need to be careful of is concentration. Last is we need to also provide some meaningful baselines , because it's a good bet the ML people are way off base compared to what's actually useful in the field

benedekrozemberczki commented 2 years ago

Added some basic loader for two datasets - I will close for now, but these comments are extremely good - the person who worked on the AZ sanger dataset (Krishna Bulusu) works with us closely.