Closed kevinxin90 closed 3 years ago
This is what I'm thinking for the schema:
{
"_id": <smpdb.smpdb_id>,
"name": <smpdb.pathway_name>,
"description": ,
"genes" : {
/* Same as other genesets */
[]
}
"metabolites": {
[
/* Values between brackets '<>' are corresponding mychem.info fields */
"mychem_id": <_id>,
"name": <unichem.chebi>,
"iupac": <drugbank.iupac OR pubchem.iupac OR chebi.iupac>,
"inchi": <chebi.inchi OR chembl.inchi OR pubchem.inchi>,
"inchikey": <chebi.inchikey OR chembl.inchi_key OR pubchem.inchi_key>,
"chebi": <chebi.id OR unichem.chebi>,
"hmdb": <unichem.hmdb>,
"kegg_cid": <chebi.xrefs.kegg_compound OR drugbank.kegg.cid>,
"chembl": <chembl.molecule_chembl_id OR unichem.chembl>,
"pubchem": <pubchem.cid>,
"drugbank": <drugbank.id>,
"cas": <chebi.xrefs.cas>,
"smiles": <chebi.smiles OR chembl.smiles OR pubchem.smiles OR drugbank.smiles>,
"smpdb_metabolite": /* metabolite_id from smpdb database */
]
}
"smpdb": {
"smpdb_id": ,
"pathway_name": ,
"pathway_subject": ,
}
}
The only required identifiers for each metabolite would be mychem_id
and the source smpdb_metabolite
. Most of the other ids can be obtained from the download file, but it would be nice to cross-check them against mychem.info.
Download: https://smpdb.ca/downloads
Should include:
Pathway - Gene: https://smpdb.ca/downloads/smpdb_proteins.csv.zip Pathway - Metabolites: https://smpdb.ca/downloads/smpdb_metabolites.csv.zip