ammar257ammar / SWAT4HCLS2022-ChEMBL-bioschemas-mapping

Hackathon project aims at mapping the ChEMBL RDF small molecules, proteins and taxons onto Bioschemas.org entities and produce the corresponding JSON-LD
GNU General Public License v3.0
1 stars 2 forks source link

SWAT4HCLS Hackathon Bioschemas Project

DOI https://img.shields.io/github/v/release/ammar257ammar/SWAT4HCLS2022-ChEMBL-bioschemas-mapping

During the hackathon of SWAT4HCLS (Jan 10-13th 2022), I worked on a project aiming at providing the ChEMBL database in JSON-LD format according to the bioschemas.org vocabulary.

Researchers participated in this project:

The project focused on mapping ChEMBL data onto 3 types of entities from the Bioschemas vocbulary:

  1. MolecularEntity
  2. Protein
  3. Taxon

The approach adopted in this project is based on using the ChEMBL mirror SPARQL endpoint (v28) hosted by the department of Bioinforamtics at Maastricht University (BiGCaT) to construct the new RDF (following the Bioschemas vocabulary) from the ChEMBL RDF. A mapping between the ChEMBL entities and predicates and the Bioschemas ones was performed using SPARQL queries according to the following figures.


Mapping ChEMBL "SmallMolecule" to Bioschemas "MolecularEntity"




Mapping ChEMBL "SingleProtein" to Bioschemas "Protein"




Implementation




NOTE: you can dowload the JSON-LD resulted from this project from the releases tab