biothings / mygeneset.info

Apache License 2.0
5 stars 3 forks source link

Add SMPDB data source #33

Closed kevinxin90 closed 3 years ago

kevinxin90 commented 3 years ago

Download: https://smpdb.ca/downloads

Should include:

Pathway - Gene: https://smpdb.ca/downloads/smpdb_proteins.csv.zip Pathway - Metabolites: https://smpdb.ca/downloads/smpdb_metabolites.csv.zip

ravila4 commented 3 years ago

This is what I'm thinking for the schema:

{
  "_id": <smpdb.smpdb_id>,
  "name": <smpdb.pathway_name>,
  "description": ,
  "genes" : {
    /* Same as other genesets */
    []
  }
  "metabolites": {
    [
      /* Values between brackets '<>' are corresponding mychem.info fields */
      "mychem_id": <_id>,
      "name": <unichem.chebi>,
      "iupac": <drugbank.iupac OR pubchem.iupac OR chebi.iupac>,
      "inchi": <chebi.inchi OR chembl.inchi OR pubchem.inchi>,
      "inchikey": <chebi.inchikey OR chembl.inchi_key OR pubchem.inchi_key>,
      "chebi": <chebi.id OR unichem.chebi>,
      "hmdb": <unichem.hmdb>,
      "kegg_cid": <chebi.xrefs.kegg_compound OR  drugbank.kegg.cid>,
      "chembl": <chembl.molecule_chembl_id OR unichem.chembl>,
      "pubchem": <pubchem.cid>,
      "drugbank": <drugbank.id>,
      "cas": <chebi.xrefs.cas>,
      "smiles": <chebi.smiles OR chembl.smiles OR pubchem.smiles OR drugbank.smiles>,
      "smpdb_metabolite": /* metabolite_id from smpdb database */
    ]
  }
  "smpdb": {
    "smpdb_id": ,
    "pathway_name": ,
    "pathway_subject": ,

  }
}

The only required identifiers for each metabolite would be mychem_id and the source smpdb_metabolite. Most of the other ids can be obtained from the download file, but it would be nice to cross-check them against mychem.info.