FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
104 stars 27 forks source link

No samples remain under current cutoff error #235

Closed kimcandel closed 8 months ago

kimcandel commented 8 months ago

I'm newer to data analysis in general and still learning how to use ancombc. I keep getting this error when trying the run the analysis: "No samples remain under current cutoff". I understand that means that there is maybe either nothing significant or my parameters are too strict? My data input is a large phyloseq object and the sample data consists of three categorical variables (1) called "meadow.bare" which consists two levels (2) called "site" which consists of three levels (3) called "depth" which consists of eight levels. Depth is a new categorical variable that I have recently added into my phyloseq object from my metadata. The strange thing is that I have run the analysis before adding depth as a variable with the meadow.bare and site variables which were successful. But now, with depth added as a third variable, when I try to make the meadow.bare comparison, it no longer works. Here's the code I'm using:

output = ancombc2(data = tse, assay_name = "counts", tax_level = "Family", fix_formula = "Site + Meadow.Bare + Depth", rand_formula = NULL, p_adj_method = "holm", pseudo = 0, pseudo_sens = FALSE, prv_cut = 0.10, lib_cut = 1000, s0_perc = 0.05, group = "Meadow.Bare", struc_zero = FALSE, neg_lb = FALSE, alpha = 0.05, n_cl = 1, verbose = TRUE, global = FALSE, pairwise = TRUE, dunnet = FALSE, trend = FALSE)

I have tried subsetting my data and loosening the parameters (the code above shows the stricter parameters), either didn't work. I also tried taking depth out of fix_formula which also did not change anything. When I remove depth completely from my phyloseq object, the analysis works as it did before. However, I would like to run the analysis on the depth variable as I'm interested in how taxa change with depth.

agrier-wcm commented 8 months ago

This sounds like an issue with the Depth variable or the phyloseq object itself. Can you double check the phyloseq object to see if you're somehow ending up with a bunch of NAs in the metadata? Ex:

sum(complete.cases(meta(my_phyloseq_object))) #Returns number of samples without NAs in metadata

kimcandel commented 8 months ago

I checked, and I didn't find any NA's in the metadata. My complete cases number was equal to how many samples I have.

agrier-wcm commented 8 months ago

Hmm. What is the output of this:

df <- meta(tse) ; table(df$Depth) ;

And just to be clear, you can remove Depth completely from the ps object and from the ancombc2 command and it works fine, then if you add Depth to the ps object, without adding it anywhere in the ancombc2 command, it gives the error about no samples remaining?

kimcandel commented 8 months ago

The output looks like: 0-2 10-12 12-14 14-16 2-4 4-6 6-8 8-10 34 28 24 18 28 30 32 29

in the past, I have tried subsetting the samples before to take out the 14-16 level as well as taking out both 12-14 and 14-16 levels since they have a bit less samples than the rest. I'm not too confident if the uneven numbers is what is causing the issue as there are also not equal numbers in either the Meadow.Bare and Site variable levels.

To answer your clarification, you understood correctly. Removing depth from the ps object completely allows it to work correctly but when depth is added to the ps even if it isn't anywhere in the command, I get the error.

agrier-wcm commented 8 months ago

Ah. I suspect it's the names of the Depth values themselves. Can you try renaming the Depth values so that they start with a character? Ex (assuming metadata is the dataframe you're going to use as metadata for your ps object):

metadata$Depth <- paste("Depth_", metadata$Depth, sep="")

And then remake the ps object with these renamed values for Depth (don't just make a new/separate variable with these names, replace/get rid of the old ones that start with numbers - this command will do that assuming the original variable is called Depth).

Edit to add:

I'm pretty sure we're on the right track here, but you might also need to replace the dashes (I'm not positive of this). You could do that like this, after the above command:

metadata$Depth <- gsub('-', '_',metadata$Depth)

kimcandel commented 8 months ago

Ok I've renamed them using your example and also replaced the dashes so values in Depth now look like Depth_0_2 but it's still not working. Something I forgot to mention, before opening this thread I had noticed that the Depth variable was a "factor" instead of a "character" and had just used the as.character() function on it, which also did not work.

agrier-wcm commented 8 months ago

1) Still the same error message?

2) It is appropriate for all grouping variables to be factors. I'm pretty sure characters are simply converted to factors under the hood anyway, but just to be sure, and to makes sure the names of the values are okay, could you try this:

metadata$Site <- factor(make.names(metadata$Site))
metadata$Meadow.Bare <- factor(make.names(metadata$Meadow.Bare))
metadata$Depth <- factor(make.names(metadata$Depth))

Then, remake the ps object using metadata as the metadata, and do not use the as.character() function - keep them as factors.

If you have any other grouping variables in your metadata table aside from the three listed above (even if you're not using them), you will want to do that same thing as above where you convert it to a factor and use make.names .

3) I don't think this is the issue, but since Meadow.Bare is your group of interest and it only has two levels, can you set group = NULL and pairwise = FALSE (you will still get comparison between the two levels of Meadow.Bare in the main results output$res dataframe, and pairwise only acts on your group variable anyway)? When the main group variable you care about only has two levels, you should generally set group = NULL .

4) If these things don't work, would you be willing to share your data, or at least a subset of it for which you still get the same error? You should be able to upload your metadata, taxonomy, and OTU table to your reply here if they are of a reasonable size. I can construct the ps object on my end.

kimcandel commented 8 months ago

I was getting the same error message but I took your latest suggestions, the issue resolved, and the function is working now! Making everything a factor appeared to work. Thank you so much!