Closed DeniseSl22 closed 5 years ago
This is a good use case to have more provenance of the history of that mapping.
@DeniseSl22, can you ask Irene to verify with locally with only the latest ID mapping file? Having more than one is bound to give issues, and maybe the only reason. At this moment we do not know if this is a problem in the code, in the data, or in the webservice.
Yes I'll check together with her; we could use the R-script tutorial you added to the Tess portal of Elixir.
I checked it for all HMDB IDs that reported a second Inchikey in the webservice, using the metabolites_20190509 file and only one Inchikey is reported back, the one that is also stated on the HMBD and ChEBI websites
Okay thank you for checking @IreneHemel ! Then it's probably finding old inchikeys due to old mapping files that are loaded. @nunogit , what's the status of removing old versions of metabolomics mapping files :)?
Okay, @nunogit removed the old mapping files, and here is the result:
I check the three examples above, and they all now give 1 InchiKey. Also the "metadata" query indicated that only one metabolite.bridge file is loaded :D.
One of my interns (@IreneHemel) is working on identifier mappings between metabolites. She checked for the HMDB IDs that were given in IEMBase (as biomarkers for diseases), if these have any mappings to ChEBI (and if yes, which ones they are). She needs the ChEBI IDs to map correctly, since these are represented in the PWs she is working on. While doing this checking of mappings, she found some compounds to have two InChiKeys, for example for Thymine (HMDB0000262):
RWQNBRDOKXIBIV-UHFFFAOYSA-N InChIKey (Correct)
YQHWOOLBIREPRR-VZUYHUTRSA-N InChIKey (Wrong)
However, one of the inchiKeys is completely wrong (I looked for them through ChemSpider, see above).
I've tried to track where this mapping originates from (HMDB, ChEBI or Wikidata), however the wrong InchiKey is not present in any of them.... so I'm wondering why it is even queryable in the webservice of BridgeDb. According to the properties query, there are several metabolite-mapping files loaded (even originating from 2013!):
So, I think we need to make sure that:
@IreneHemel has several other examples, if needed. Some more are below: HMDB0000300 (2 inchikeys, PRFVPHBJWNBZBM-GGCSAXROSA-N not retrievable through scholia(=wikidata), chemspider and chebi). HMDB0000273 (2 inchikeys, KHUMAHPJNTVTEQ-DXQCBLCSSA-N not retrievable through scholia(=wikidata), chemspider and chebi).