Refers to issue #33.
Metabolites parser queries MyChem.info using inchi key identifiers, to verify that the compound exists in the database. Also fetches two additional ids: pubchem.cid and chembl.molecule_chembl_id, as these are fairly common and were not found in the downloaded data.
def parse_metabolites(data_folder):
all_compounds = set()
fields = ['InChI Key']
for f in glob(os.path.join(data_folder, "*_metabolites.csv")):
tmp_df = pd.read_csv(f, usecols=fields).fillna("")
# Skip empty files
if len(tmp_df) == 0:
continue
all_compounds = all_compounds | set(tmp_df['InChI Key'])
# Query MyChem.info
mc = biothings_client.MyChemInfo()
resp = mc.getchems(all_compounds, fields='pubchem.cid, chembl.molecule_chembl_id', dotfield=True)
...
Refers to issue #33. Metabolites parser queries MyChem.info using inchi key identifiers, to verify that the compound exists in the database. Also fetches two additional ids: pubchem.cid and chembl.molecule_chembl_id, as these are fairly common and were not found in the downloaded data.
Final ES mapping: