Knowledge-Graph-Hub / kg-microbe

https://knowledge-graph-hub.github.io/kg-microbe/index.html
BSD 3-Clause "New" or "Revised" License
14 stars 3 forks source link

Ingest Weissman et al human microbiome taxa microbial trait data #27

Open realmarcin opened 3 years ago

realmarcin commented 3 years ago

The data is from this paper: Exploring the functional composition of the human microbiome using a hand-curated microbial trait database https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04216-2

The dataset itself is Additional File 1: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04216-2#MOESM1

To start we could perform NER with the same dictionaries as Madin et al - so NCBI Taxonomy, ENVO, ECOCORE, ChEBI.

There are additional numerical columns of interest here beyond what Madin et al provided:

cmungall commented 3 years ago

OK, here is the drill:

  1. add step to fetch this in the Makefile
  2. run through linkml-model-enrichment to get the first pass of the schema (https://github.com/linkml/linkml-model-enrichment)
  3. this should also give you enums with ontology term IDs that will need manually checked, though not sure if @turbomam's code integrated yet
  4. refine the schema manually, including adding mappings to biolink
  5. lightweight python to make kgx

See https://docs.google.com/document/d/1iEsLp9pDvjGjgWMSLArtNf6Jwan-wjMl6_viQGyTWG8/edit# for approach