Closed cmkobel closed 4 months ago
Idea:
One plan could be instead to use GO biological process https://genomespot.blogspot.com/2024/02/dont-use-kegg.html
So download uniref and mapping_selected (https://www.uniprot.org/help/downloads). Map with diamond or mmseqs2. Download GO BP and perform the the hierarchical mapping and compute enrichment.
Ref https://github.com/chklovski/CheckM2/issues/99
Looks like KEGG is not a viable option for the future, and it is not possible to continue reusing the checkm2 database. Will have to seriously consider implementing GO BP GSEA.
As I just closed #90 for merging it into here, I should mention that in any case, for licensing reasons (I think) the user (pipeline instance) must manually download the file:
Downloaded from https://www.kegg.jp/kegg-bin/download_htext?htext=ko00001.keg&format=json&filedir= Used in kegg_pathway.R.
I still don't have a good plan for this. Is going back to kofam_scan the best option? How does eggnog do it, and can its output for downstream hypertests?
Solution was to use eggnog to map to KO. Is used in rule kegg_pathway as well now.
According to the checkm2 paper, the current version may be as old as from 2018?