cmkobel / CompareM2

🦠📇 Microbial genomes-to-report pipeline
https://CompareM2.readthedocs.io
GNU General Public License v3.0
52 stars 3 forks source link

Feature request #62

Open magnusarntzen opened 11 months ago

magnusarntzen commented 11 months ago

Since you asked for feedback...

What about implementing calculations of module competion factors (mcf)? These are values between 0-1 indicating whether a Bin has the required genes to complete a given reaksjon, e.g., 'denitrification' or 'methanogenesis'.

This can be done with the MetQy package in R (I have code if you want) and it would complement your output nicely. I attach an example output for some of my samples with 150 bins. MetQy_mcf.pdf

cmkobel commented 11 months ago

Thanks! Great idea! MCFs are definitely easier to interpret than p-values for GSEA. That R package looks neat, but the KO calls are already made in the kegg_diamond rule so we just need the table and algorithm that links the KOs to pathways and computes the MCF, then we're there! I'll look into a way of integrating that.

magnusarntzen commented 11 months ago

Hey, The R-package MetQy does not do the KO calling so it is good you have another program that does that for you. I use KoFamScan in my pipelines but I am sure kegg_diamond does the trick too.

MetQy takes a dataframe with semicolon-separated KOs per bin: Bin1 K00001;K00032;K24233 Bin2 K22001;K32231 Etc.

NB: these are lists of gene K-numbers, not pathway KO-numbers.

It uses about 10-15 minutes for 150 bins on my laptop but will be fast on the Threadripper I suppose.

-M

From: Carl Mathias Kobel @.> Sent: onsdag 11. oktober 2023 16:29 To: cmkobel/assemblycomparator2 @.> Cc: Magnus Øverlie Arntzen @.>; Author @.> Subject: Re: [cmkobel/assemblycomparator2] Feature request (Issue #62)

Thanks! Great idea! MCFs are definitely easier to interprete than p-values for GSEA. That R package looks neat, but the KO calls are already made in the kegg_diamond rule so we just need the table and algorithm that links the KOs to pathways and computes the MCF, then we're there! I'll look into a way of solving that.

— Reply to this email directly, view it on GitHubhttps://github.com/cmkobel/assemblycomparator2/issues/62#issuecomment-1757817856, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIFICYTE4PMN4DXRGPNZWL3X62ULHANCNFSM6AAAAAA534632M. You are receiving this because you authored the thread.Message ID: @.**@.>>

cmkobel commented 5 months ago

This will be solved by adding gapseq which calculates pathway completion fractions. It is well maintained and very powerful. Currently waiting for r-chnosz to be published on conda-forge so we can publish gapseq on bioconda, so we can finally add gapseq to asscom2.