ChiLiubio / microeco

An R package for data analysis in microbial community ecology
GNU General Public License v3.0
194 stars 56 forks source link

How do CCA calculations use the first 10 Phylum/Genus? #386

Open makerer5 opened 1 month ago

makerer5 commented 1 month ago

Hi I am using CCA analysis to calculate the relationship between the environmental factors and the top 10 Phylum/Genus, but I found that the RDA analysis can select the top 10 Phylum/Genus, but CCA doesn't seem to be able to select the top 10 Phylum/Genus freely. This is my RDA analysis code:

use Genus

t1$cal_ordination(method = "RDA", taxa_level = "Phylum")

select 10 features and adjust the arrow length

t1$trans_ordination(show_taxa = 10, adjust_arrow_length = TRUE, max_perc_env = 1.5, max_perc_tax = 1.5, min_perc_env = 0.2, min_perc_tax = 0.2)

t1$res_rda_trans is the transformed result for plot

t1$plot_ordination(plot_color = "Group",color_values = venn_colors) This is my CCA code:

CCA, canonical correspondence analysis

t1$cal_ordination(method = "CCA", taxa_level = "Genus") t1$trans_ordination(adjust_arrow_length = TRUE,show_taxa = 10,max_perc_env = 1.5, max_perc_tax = 1.5, min_perc_env = 0.2, min_perc_tax = 0.2) t1$plot_ordination(plot_color = "Group", plot_shape = "Group", color_values = venn_colors) How does the CCA analysis select the top 10 Phylum/Genus?

ChiLiubio commented 1 month ago

Hi. The default show_taxa = 10 in trans_ordination function is suitable for both RDA and CCA. I have tested it using dataset inside the package. If you find it is failed to select the number, could you please attach an example that I can reproduce?

makerer5 commented 1 month ago

Maybe I didn't express myself clearly. The calculations I get using RDA are significantly different from CCA, mainly because the first 10 Phylum/Genus are different. Fig. 1 is the result of RDA calculation and Fig. 2 is the result of CCA calculation. The first 10 Phylum/Genus from RDA match the real first 10 phylums, while the 10 Phylum/Genus selected by CCA are not the first 10 species that I wanted

Translated with www.DeepL.com/Translator (free version) CCA.pdf RDA.pdf dataset.zip

ChiLiubio commented 1 month ago

Hi. Comparing these two methods is not of practical value, as they often differ significantly. It's similar to when conducting correlation analysis, choosing between Pearson and Spearman can sometimes make a big difference. Which model to choose depends on certain assumptions, such as if you believe a linear model is more suitable for the data, then simply select RDA.