ingest SARS-CoV-2 structural data table

realmarcin commented 4 years ago

From: https://covid19.bioreproducibility.org/

Some key pieces would be:

structure determination method (homology models would be from different source)
species
major ligand
structural ligands eg metal ions
actual sequence corresponding to solved structure (eg N/C terminus, AA mutations, modifications)
cofactors
pathogen-host interaction
resolution

lpalbou commented 4 years ago

IMO, the easiest reusable dataset for that study would be the official drug-pdb mapping: https://www.rcsb.org/pdb/ligand/drugMapping.do

It details the structures of targets and targets + drug.

Most structures are only fragments and a number are not human proteins. In terms of quality, the general criteria are:

X-ray > RMN
X-ray with resolution better than 3A (<3A)

We could check is the side chains are entirely resolved but I don't recommend going down that route unless there is a reason.

If we want more human proteins (and longer fragments/coverages), I recommend using Swiss model (17k human proteins modeled): https://swissmodel.expasy.org/repository . I also used to work with them so we have a few contacts there.

If you want more details on structures, I still have somewhere my code (https://academic.oup.com/nar/article/39/1/30/2409207) that details all entity types and all types of interactions in each structure.

All meta data shown on the PDB site (including mapping with uniprot) is also available in their Data API: https://www.rcsb.org/pages/webservices. I suppose that data is also available on their ftp or we could contact them.

justaddcoffee commented 4 years ago

This ticket might be a duplicate of https://github.com/Knowledge-Graph-Hub/kg-covid-19/issues/188

lpalbou commented 4 years ago

Other interesting databases:

PQS (get the quaternary structure)
DrugBinding (to get Kd)
TTD (Therapeutic Target Database)
STITCH (chemical interactions; a little bit like STRING)

And I always use in conjunction SIDER and DrugBank.

justaddcoffee commented 4 years ago

TTD (Therapeutic Target Database)

FWIW we have TTD ingested already here, others look interesting.

STITCH is possibly lower priority since we have STRING already

Knowledge-Graph-Hub / kg-covid-19

ingest SARS-CoV-2 structural data table #186