add ChEMBL<>Wikidata mappings?

bridgedb / create-bridgedb-metabolites

Create BridgeDb identity mapping files from HMDB, ChEBI, and Wikidata

Other

4 stars 4 forks source link

add ChEMBL<>Wikidata mappings? #10

Closed egonw closed 5 years ago

egonw commented 5 years ago

Denise, should we add ChEMBL<>Wikidata mappings?

DeniseSl22 commented 5 years ago

Mhh..... CHEMBL is really huge right? Don't you think that will blow up the local mapping file in size? The again, I did see lots of people at ICCS use CHEMBL (iso drugbank), so it could be worth to add them..... mhh, though question... how large is the cytargetlinker file for CHEMBL (or are there 2, one with approved and one with all drugs/)

egonw commented 5 years ago

Yes, chembl is huge, but only a subset is in Wikidata

Chris-Evelo commented 5 years ago

We will probably want subsets for specific purposes too. Something like:

1) anything that actually is an endogenous compound in species relevant on WikiPathways (and thus occurs in a pathway, or should... ) (which we could use for WIkiPathways) 2) anything that is a drug or a pharmaceutically active compound or relevant for tox projects (which we could use for a lot of purposes in Cytoscape and such)

egonw commented 5 years ago

The first step I had in mind was just the current mappings in Wikidata...

DeniseSl22 commented 5 years ago

I think the subset idea of @Chris-Evelo would work well for cytoscape (but since we already have a linkset for this, we don't need bridgedb or wikidata for that). So if we would query the chembl compounds which are now in wikidata, and add them for mapping purposes, it would work for wikipathways.... It's going to be hard to determine from wikidata which compounds are endogenous yes/no, since most compounds use the "instance of" property, tied to "chemical compound".

I just checked: chebi does include linkouts to chembl, HMDb does not.... Perhaps including chembl form wikidata to also creates indirect mappings to Chebi which we now do not have directly?

Chris-Evelo commented 5 years ago

Hmmm OK if we think endogenous for human than a simple (perhaps overly simple?) approach would be to select all metabolites that are in RECON2 (which assumes we have interoperable IDs for these, which is something I am not sure of either)

egonw commented 5 years ago

I think the current ChEMBL identifiers in Wikidata make a reasonable subset. Less then 50 thousand: http://tinyurl.com/yd668rt8

Including that would make ten compounds in pathways interoperable and may link out to ChEMBL on more pathways. ChEMBL is an ELIXIR Core Resources, and I like to see at least the RDF link out to as many core resources as possible.

DeniseSl22 commented 5 years ago

Well, okay, if it is a Elixir core resource, then it would be great to add them... Can we then also ask them to give their content as CC0 for addition to Wikidata :p ?

Chris-Evelo commented 5 years ago

They might agree to the mappings indeed (if that is even needed), for their actual data that is less likely

DeniseSl22 commented 5 years ago

Closed with b44529666e62b0ad3c0d2065a8594495d330445b