FAIRplus / the-fair-cookbook

The FAIR cookbook, containing recipes to make your data more FAIR. Find the rendered version on:
https://faircookbook.elixir-europe.org/
130 stars 58 forks source link

How to generate GUPRIs for chemical compounds, starting from an InChIKey #396

Open ghost opened 2 years ago

ghost commented 2 years ago
ghost commented 2 years ago

Taken from the Deliverable D3.4:

A typical problem associated with chemical compounds in collaborative contexts is the unambiguous identification of the compound under consideration. A typical scenario is that two parties need to “speak” about a specific compound, while keeping partial information secret from the other party. An example for secret properties would be the exact chemical formula or some physicochemical properties, e.g. solubility or toxicity. Bayer explored in collaboration with Maastricht University, how to reference to chemical compounds can be implemented according to the FAIR guiding principles. As a demonstrator, more than 20,000 entries about chemical compounds were created in the public resource Wikidata,. Where available in public resources, the IUPAC name, the chemical formula, the canonical SMILES, the isomeric SMILES, the canonical InChI, the InChI key, the PubChem CID, and the melting point were made openly available under Wikidata’s CC0 licence. An exemplary entry is http://www.wikidata.org/entity/Q107968523. This Globally Unique Persistent Resolvable Identifier (GUPRI) allows to reference the underlying compound (current IUPAC name: 2,2,2-trifluoro-N-[4-(5-oxo-3,4-diazabicyclo[4.1.0]hept-2-en-2-yl)phenyl]acetamide), and to look up further information in desired formats. For humans, resolving the GUPRI as an URL via entering it into a browser leads to a HTML page about the compound on Wikidata. When a machine resolves the GUPRI, the formats .json, .rdf, .ttl, .nt or .jsonld can be returned as content-negotiated by client and server. Additionally, the compound and all other entries in Wikidata can be queried via SPARQL.

Now, more than 20,000 compounds can be referenced from Bayer or other parties and this allows them to store and/or share additional properties. Properties from different parties can now be brought together on demand by using the common identifier in the first place and/or mapping internal identifiers to the common identifier first. This allows more efficient collaboration and data sharing.

ghost commented 2 years ago

Hi @egonw , I created this meta-issue for tracking the progress in writing the recipe, and added it to your big meta container issue (https://github.com/FAIRplus/the-fair-cookbook/issues/332), too.

What do you think? Is the "abstract" above (including conclusion section, obviously) fair?

egonw commented 2 years ago

I'll comment in more detail asap. But basically this is what Lucas' BridgeDb recipe is about... but @DeniseSl22 already pointed out the lack of metabolite ID mapping there...

egonw commented 2 years ago

Author: Egon