FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
108 stars 29 forks source link

structural zeros interpretation #295

Open simone-anza opened 2 weeks ago

simone-anza commented 2 weeks ago

Hi,

thank you for constantly updating ANCOMBC, it's a great tool! I am currently running the following model

out = ancombc2(data = physeq_HIGHLOW,
                         fix_formula = "prenatal_SD_group+
                         breast_milk_cat + age_at_sampling_month+
                         delivery+ 
                         Child_sex",
              rand_formula = "(1|dyad_id)",
              p_adj_method = "BH",
              pseudo_sens = T,
              prv_cut = 0.1,
              lib_cut = 0, #no filtering cause is grouped
              group = "prenatal_SD_group",
              struc_zero = T,
              neg_lb = F,
              iter_control = list(tol = 1e-05, max_iter = 100, 
                                  verbose = TRUE),
              em_control = list(tol = 1e-05, max_iter = 100),
              alpha = 0.05
                          )

So my var prenatal_SD_group contains 2 groups (HIGH vs LOW). So when I run

tab_zero = out$zero_ind
tab_zero %>%
  datatable(caption = "The detection of structural zeros")

which according with the tutorial will produce

A clarification regarding Structural zeros: A taxon is considered to have structural zeros in some (>=1) groups if it is completely (or nearly completely) missing in these groups. For instance, suppose there are three groups: g1, g2, and g3. If the counts of taxon A in g1 are 0, but they are nonzero in g2 and g3, then taxon A will be considered to contain structural zeros in g1. In this example, taxon A is declared to be differentially abundant between g1 and g2, g1 and g3, and consequently, it is globally differentially abundant with respect to this group variable. Such taxa are not further analyzed using ANCOM-BC2, but the results are summarized in a separate table.

and thus produce only TRUE or FALSE values per each column analyzed. I get my list with 2 columns: HIGH and LOW and I get the following possibilities for each taxon:

HIGH with TRUE vs LOW with TRUE HIGH with FALSE vs LOW with TRUE HIGH with FALSE vs LOW with FALSE HIGH with TRUE vs LOW with FALSE

But my question is: how is that possibile that I see taxa wich are structural zeros in both my categories? That means that the taxa is 0 in all the observations in both the 2 groups, right? If it is like this, it shouldn't be analyzed by ANCOMBC, right? Am I missing something here? Not sure

Thank you!