Open cmungall opened 3 years ago
To work on this ticket, make PRs on pathways.sssom.tsv in schemas/
Transferring earlier discussions here:
Placeholder for ongoing work, edit as needed.
Merge NER results into SSOM mapping file is a next step.
Then update KG build workflow to load mappings from SSOM file instead of hard-coded in ingest script.
Let's start with
once this is done, we will assess the situation. If KEGG has higher coverage than GO, this is useful information I can take back to GO and figure out why GO has poorer coverage here
If neither gives us the coverage we need we need to look a bit deeper. Is this a matter of insufficient precomposed terms in the ontology? Is there a systematic thing we can address, e.g. by adding synonyms for GO?
I did a quick OGER run through the KEGG Pathways with GO as dictionary (output file).
Total number of KEGG pathways: 542 Total number of KEGG pathways tagged by GO: 352 Total Number of pathways in KEGG absent in GO: 190
Split from #2.
1 trithionate_oxidation 1 carbonmonoxide_oxidation 1 tetrathionate_oxidation, iron_reduction 1 pyrrhotite_oxidation 1 galena_oxidation 1 thiocyanate_oxidation 1 carbonylsulfide_oxidation ... 182 sulfur_reduction 186 thiosulfate_reduction 353 aerobic_chemo_heterotrophy 366 fermentation 371 denitrification 400 nitrite_reduction 983 NA 1420 nitrate_reduction
These should all map to GO