bridgedb / create-bridgedb-metabolites

Create BridgeDb identity mapping files from HMDB, ChEBI, and Wikidata
Other
4 stars 4 forks source link

DrugBank identifiers wrong in Derby file #19

Closed egonw closed 5 years ago

egonw commented 5 years ago

The QC tool reports:

ERROR: 9212/9212 (100%) ids do not match expected pattern for DrugBank
ERROR: expected pattern is '^DB\d{5}$'
ERROR: aberrant ids are e.g. '00008', '00009', '00010', '00002', '00003', '00001', '00006', '00007', '00004', '00005'

I think these comes from Wikidata, which leaves out the DB.

DeniseSl22 commented 5 years ago

Mhh, that is interesting.... So you want me to fix that in Wikidata right? I can see if they are running a test on Wikidata as well to test for IDs of drugbank that do not adhere to the required standard (and otherwise add that, they have something similar for ChEBI I believe).

egonw commented 5 years ago

No, it needs to be fixed when importing the data. The format on Wikidata is without the "DB" part.

DeniseSl22 commented 5 years ago

Ah okay.... and we are only taking in DrugBank from Wikidata right? (HMDB also had drugbank IDs, and they do start with DB still....)

egonw commented 5 years ago

We just have to make sure they start with DB, but, yes, I think we only take them from Wikidata.

DeniseSl22 commented 5 years ago

Yes I just checked, see line 209 in createDerby: // addXRef(database, ref, rootNode.drugbank_id.toString(), drugbankDS);

So we are only taking in Drugbank IDs from Wikidata. I'll try to fix this issue tomorrow/Wednesday (before I'll release a new mapping file, with latest ChEBI release).