Open realmarcin opened 4 years ago
IMO, the easiest reusable dataset for that study would be the official drug-pdb mapping: https://www.rcsb.org/pdb/ligand/drugMapping.do
It details the structures of targets and targets + drug.
Most structures are only fragments and a number are not human proteins. In terms of quality, the general criteria are:
We could check is the side chains are entirely resolved but I don't recommend going down that route unless there is a reason.
If we want more human proteins (and longer fragments/coverages), I recommend using Swiss model (17k human proteins modeled): https://swissmodel.expasy.org/repository . I also used to work with them so we have a few contacts there.
If you want more details on structures, I still have somewhere my code (https://academic.oup.com/nar/article/39/1/30/2409207) that details all entity types and all types of interactions in each structure.
All meta data shown on the PDB site (including mapping with uniprot) is also available in their Data API: https://www.rcsb.org/pages/webservices. I suppose that data is also available on their ftp or we could contact them.
This ticket might be a duplicate of https://github.com/Knowledge-Graph-Hub/kg-covid-19/issues/188
Other interesting databases:
And I always use in conjunction SIDER and DrugBank.
TTD (Therapeutic Target Database)
FWIW we have TTD ingested already here, others look interesting.
STITCH is possibly lower priority since we have STRING already
From: https://covid19.bioreproducibility.org/
Some key pieces would be: