SmartDataAnalytics / BioKEEN

A computational library for learning and evaluating biological knowledge graph embeddings - please see the main PyKEEN repo at https://github.com/pykeen/pykeen/
https://github.com/pykeen/pykeen/
MIT License
46 stars 4 forks source link

Applications to chemogenomics data #17

Closed cthoyt closed 2 years ago

cthoyt commented 5 years ago

The EXCAPE-DB (manuscript, data download )is the easiest database to use with chemogenomic data - it's actually the pinacle of curation and preprocessing.

Until now, I've asked students to work on this but they never realized how important it was, so I will finish the corresponding bio2bel repository myself and then we will have the best data set for this that exists.

The thing is, it's very important to consider the IC50 values associated with each edge. How would that work in to the available models, if even at all? Assigning a hard cutoff is not a good idea, since it would throw away incredible amounts of information. Maybe we could bin, but then we would have to introduce some sort of notion of ordering of edges into the model as well.