Open wangzhichao1990 opened 2 years ago
Hi @wangzhichao1990
This is a bug coming from the chaos of taxonomy lineage with "Incertae_Sedis". Such unclear taxonomy can cause the parent taxonomy disorder as many different taxa may has same name "Incertae_Sedis". The inner filtering step did not find this. I will fix this. As a temporary solution, you can add a step to manually filter those taxa like this:
t <- trans_diff$new(
dataset = dataset,
method = "lefse",
group = "group",
taxa_level = "all",
alpha = 0.05,
)
# add this step
t$abund_table %<>% .[!grepl("Incertae_Sedis", rownames(.)), ]
t$plot_diff_cladogram(
use_taxa_num = 200,
use_feature_num = 30,
group_order = t$res_diff$Group %>% unique()
)
This is a long-standing issue in taxonomy names of SILVA and also other some database. I have tried to filter all the chaotic taxonomy in the plot_diff_cladogram function as much as possible because this operation has a high requirement on the consistency of taxonomy. Thanks very much for your finding.
Best Chi
@ChiLiubio Thank you for your answer.
Hi, When using the Silva database, plot_diff_cladogram function runs for a long time without any result or error.
My code is shown below.
When using the Greengenes database, it works well.
This compressed file is test data: datasets.zip
I compared the differences between the two databases. For the Silva database, it seems that there is more than one word in the classification name of some levels. That may be the reason, but I'm not sure.