Knowledge-Graph-Hub / kg-microbe

https://knowledge-graph-hub.github.io/kg-microbe/index.html
BSD 3-Clause "New" or "Revised" License
15 stars 3 forks source link

map pathways column in big trait table #5

Open cmungall opened 3 years ago

cmungall commented 3 years ago

Split from #2.

1 trithionate_oxidation 1 carbonmonoxide_oxidation 1 tetrathionate_oxidation, iron_reduction 1 pyrrhotite_oxidation 1 galena_oxidation 1 thiocyanate_oxidation 1 carbonylsulfide_oxidation ... 182 sulfur_reduction 186 thiosulfate_reduction 353 aerobic_chemo_heterotrophy 366 fermentation 371 denitrification 400 nitrite_reduction 983 NA 1420 nitrate_reduction

These should all map to GO

cmungall commented 3 years ago

To work on this ticket, make PRs on pathways.sssom.tsv in schemas/

realmarcin commented 3 years ago

Transferring earlier discussions here:

realmarcin commented 3 years ago

Placeholder for ongoing work, edit as needed.

Merge NER results into SSOM mapping file is a next step.

Then update KG build workflow to load mappings from SSOM file instead of hard-coded in ingest script.

cmungall commented 3 years ago

Let's start with

once this is done, we will assess the situation. If KEGG has higher coverage than GO, this is useful information I can take back to GO and figure out why GO has poorer coverage here

If neither gives us the coverage we need we need to look a bit deeper. Is this a matter of insufficient precomposed terms in the ontology? Is there a systematic thing we can address, e.g. by adding synonyms for GO?

hrshdhgd commented 3 years ago

I did a quick OGER run through the KEGG Pathways with GO as dictionary (output file).

Total number of KEGG pathways: 542 Total number of KEGG pathways tagged by GO: 352 Total Number of pathways in KEGG absent in GO: 190