ChiLiubio / microeco

An R package for data analysis in microbial community ecology
GNU General Public License v3.0
181 stars 55 forks source link

Chao1 index is the same with observed. #349

Closed WangWaud closed 2 months ago

WangWaud commented 2 months ago

Hi, there! These days I'm trying to visualize the α diversity of different microbiome data, the problem is whatever the data is, the Chao1 index and observed index were always the same, and thus the figures were the same. I'm wondering whether it is abnormal or not and why. The code is as follows:

library(microeco)
library(magrittr)
library(ggplot2)
library(phyloseq)
library(ggsci)
library(dplyr)
library(patchwork)
library(ggsignif)

theme_set(theme_bw())

# 读取数据
sample_info = read.csv("metadata.tsv", sep="\t",header = TRUE,row.names = 1)
otu_table = read.csv("16S/ASV_table.txt", sep = "\t", header = TRUE, row.names = 1)
taxonomy_table =read.csv("16S/tax.txt", sep = "\t", header = TRUE, row.names = 1)
phylo_tree_16S = read_tree("16S/rootedtree.nwk")

# 将phylo_tree_16S的id转换为ASV
# 假设你已经有一个名为id_to_asvid的数据框(data.frame),它有两列:原ID和ASV ID
id_to_asvid = read.csv("16S/id_to_asvid.csv", header = TRUE)
# 更新phylo_tree的tip.label
phylo_tree_16S$tip.label <- id_to_asvid[match(phylo_tree_16S$tip.label, id_to_asvid$FeatureID), ]$ASVID

# 检查更新后的标签
head(phylo_tree_16S$tip.label)

dataset <- microtable$new(sample_table = sample_info,
                          otu_table = otu_table, 
                          tax_table = taxonomy_table,
                          phylo_tree = phylo_tree_16S)
dataset$tax_table %<>% subset(Kingdom == "Bacteria")
dataset$filter_pollution(taxa = c("mitochondria", "chloroplast"))
dataset$tidy_dataset()

# 使用sample_sums检查每个样品的测序数量,返回最小值和最大值
dataset$sample_sums()%>% range

## 比如,稀释到10000条reads
# dataset$rarefy_samples(sample.size = 10000)
# dataset$sample_sums() %>% range

# 计算每个分类水平物种丰度
dataset$cal_abund()
# return dataset$taxa_abund
class(dataset$taxa_abund)

dir.create("taxa_abund")
dataset$save_abund(dirpath = "taxa_abund")

# 计算Alpha多样性 --------------------------------------------------------------
# 如想增加系统发育多样性,使用PD =TRUE,速度略慢
dataset$cal_alphadiv(PD = TRUE)
# 返回物种多样性
class(dataset$alpha_diversity)

## 保存结果
dir.create("alpha_diversity")
dataset$save_alphadiv(dirpath = "alpha_diversity")

# 计算Beta多样性 ---------------------------------------------------------------
dataset$cal_betadiv(unifrac = TRUE)
# 返回结果
class(dataset$beta_diversity)
# 保存
dir.create("beta_diversity")
dataset$save_betadiv(dirpath = "beta_diversity")

# 筛选出部分组(B/M E C/I)进行分析 --------------------------------------------------------------

selected_samples <- dataset$sample_table[dataset$sample_table$Group %in% c("BEC", "BEI","MEC", "MEI"), ]
otu_table_filtered <- dataset$otu_table[ , rownames(selected_samples)]

# 根据更新后的sample_table和otu_table重新创建microtable对象
dataset_filtered <- microtable$new(sample_table = selected_samples,
                                   otu_table = otu_table_filtered, 
                                   tax_table = dataset$tax_table,
                                   phylo_tree = dataset$phylo_tree)
dataset_filtered$tidy_dataset()
dataset_filtered$tax_table[dataset_filtered$tax_table == ""] <- "unassigned"

t1 <- trans_abund$new(dataset = dataset_filtered, taxrank = "Phylum", ntaxa = 10, groupmean = "Group")
t1$plot_bar(color_values = RColorBrewer::brewer.pal(8, "Dark2"),bar_type = "full",
            others_color = "grey70", legend_text_italic = FALSE)+
  theme(axis.text = element_text(size = 12),
        axis.text.y = element_text(size = 12),
        panel.background = element_rect(fill = NA)) +labs(x = NULL)
ggsave("Genus.PDF", width = 8, height = 6,dpi=300)

# α多样性 --------------------------------------------------------------------
t1 <- trans_alpha$new(dataset = dataset_filtered, group  = "Group")

t1$cal_diff(method = "anova",p_adjust_method = "fdr")
t1$res_diff

# 手动绘制α多样性 ----------------------------------------------------------------

ggplot(data=t1$data_alpha[t1$data_alpha$Measure=="Chao1",],
       aes(x=Group,y=Value,fill=Group))+
  geom_boxplot(outlier.color = "#D55740", outlier.shape = 1) +
  geom_jitter(alpha = 0.7) +
  theme_classic() +
  geom_signif(comparisons = list(c("BEC","BEI"),
                                 c("MEC","MEI")),
              test = t.test, map_signif_level=T)+
  scale_fill_npg() +
  labs(x = NULL, y = "Chao1") +
  theme(axis.title = element_text(size = 16),
        axis.text = element_text(color = "black", size = 16),
        axis.text.y = element_text(size = 14),
        legend.text = element_text(color = "black", size = 9),
        strip.text = element_text(size = 14),
        legend.title = element_text(size = 12),
        legend.background = element_rect(fill = NA))
ggsave("Chao1.PDF", width = 8, height = 6, dpi = 300)

If you need more information, plz let me know. Thanks! Looking forward to your reply.

ChiLiubio commented 2 months ago

Hi. Could you please provide an example that I can reproduce? The example should also show the issue that two figures of two different data have the same results. If you need to attach your dataset, please follow the tutorial (https://chiliubio.github.io/microeco_tutorial/notes.html#save-function). The microtable object (dataset) is enough for the example, please donot provide the txt/tsv files.

WangWaud commented 2 months ago

R package

Problem

image

As you can see, the Chao1 and Observed index of the same samples are the same.

Code

library(microeco)
library(magrittr)
library(ggplot2)
library(phyloseq)
library(ggsci)
library(dplyr)
library(ggsignif)

load("example.RData")
dataset_filtered$cal_abund()

dataset_filtered$tidy_dataset()
dataset_filtered$tax_table[dataset_filtered$tax_table == ""] <- "unassigned"

t1 <- trans_abund$new(dataset = dataset_filtered, taxrank = "Phylum", ntaxa = 10, groupmean = "Group")

# α多样性 --------------------------------------------------------------------
t1 <- trans_alpha$new(dataset = dataset_filtered, group  = "Group")

t1$cal_diff(method = "anova",p_adjust_method = "fdr")
t1$res_diff

# 手动绘制α多样性 ----------------------------------------------------------------

ggplot(data=t1$data_alpha[t1$data_alpha$Measure=="Chao1",],
       aes(x=Group,y=Value,fill=Group))+
  geom_boxplot(outlier.color = "#D55740", outlier.shape = 1) +
  geom_jitter(alpha = 0.7) +
  theme_classic() +
  geom_signif(comparisons = list(c("BEC","BEI"),
                                 c("MEC","MEI")),
              test = t.test, map_signif_level=T)+
  scale_fill_npg() +
  labs(x = NULL, y = "Chao1") +
  theme(axis.title = element_text(size = 16),
        axis.text = element_text(color = "black", size = 16),
        axis.text.y = element_text(size = 14),
        legend.text = element_text(color = "black", size = 9),
        strip.text = element_text(size = 14),
        legend.title = element_text(size = 12),
        legend.background = element_rect(fill = NA))
ggsave("Chao1.PDF", width = 8, height = 6, dpi = 300)

ggplot(data=t1$data_alpha[t1$data_alpha$Measure=="Observed",],
       aes(x=Group,y=Value,fill=Group))+
  geom_boxplot(outlier.color = "#D55740", outlier.shape = 1) +
  geom_jitter(alpha = 0.7) +
  theme_classic() +
  geom_signif(comparisons = list(c("BEC","BEI"),
                                 c("MEC","MEI")),
              test = t.test, map_signif_level=T)+
  scale_fill_npg() +
  labs(x = NULL, y = "Chao1") +
  theme(axis.title = element_text(size = 16),
        axis.text = element_text(color = "black", size = 16),
        axis.text.y = element_text(size = 14),
        legend.text = element_text(color = "black", size = 9),
        strip.text = element_text(size = 14),
        legend.title = element_text(size = 12),
        legend.background = element_rect(fill = NA))
ggsave("Chao1.PDF", width = 8, height = 6, dpi = 300)

The dataset was updated. By running the code that I uploaded, you may repeat my problem. Thanks! example.RData.zip

ChiLiubio commented 2 months ago

Hi. To verify the result, please run the following steps. We can find it is same with the results of vegan package. The reason that obs and chao1 have very similar results is the rare taxa are very low in your data. For example, sum(dataset_filtered$otu_table[, 1] == 1) shows 0 feature with abundance 1. Chao1 index highly depends on the rare taxa (1 and 2), which is nessary to estimate the possible species number.

library(microeco)
library(magrittr)

load("example.RData")

dataset_filtered$cal_abund()

dataset_filtered$cal_alphadiv()
View(dataset_filtered$alpha_diversity)

tmp <- vegan::estimateR(t(dataset_filtered$otu_table))
View(tmp)
WangWaud commented 2 months ago

I got it. Thanks very much!