FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
104 stars 27 forks source link

"longer object length is not a multiple of shorter object length" when running ancombc2 #231

Open fpusan opened 9 months ago

fpusan commented 9 months ago

First, thanks for developing and maintaining this amazing tool!

I got warnings while running ancombc2 with the following command. These happened with my dataset, but not with the example datasets in https://bioconductor.org/packages/release/bioc/vignettes/ANCOMBC/inst/doc/ANCOMBC2.html.

ancombc2_out = ancombc2(data = tse,
                        assay.type = 'counts',
                        fix_formula = 'disease',
                        p_adj_method = 'fdr',
                        tax_level = NULL,
                        prv_cut = 0.10,
                        group = 'disease',
                        struc_zero = FALSE,
                        neg_lb = FALSE,
                        global = FALSE)

Warnings look like the following:

1: The group variable has < 3 categories 
The multi-group comparisons (global/pairwise/dunnet/trend) will be deactivated
2: In r0i * beta/nu0 :
  longer object length is not a multiple of shorter object length
3: In r1i * (beta - l1)/(nu0 + kappa1) :
  longer object length is not a multiple of shorter object length
4: In r2i * (beta - l2)/(nu0 + kappa2) :
  longer object length is not a multiple of shorter object length
5: In r0i/nu0 : longer object length is not a multiple of shorter object length
6: In r1i/(nu0 + kappa1) :
  longer object length is not a multiple of shorter object length
7: In r2i/(nu0 + kappa2) :
  longer object length is not a multiple of shorter object length
8: In r1i * (beta - delta)/(nu0 + kappa1) :
  longer object length is not a multiple of shorter object length
9: In r1i/(nu0 + kappa1) :
  longer object length is not a multiple of shorter object length
10: In r2i * (beta - delta)/(nu0 + kappa2) :
  longer object length is not a multiple of shorter object length
11: In r2i/(nu0 + kappa2) :
  longer object length is not a multiple of shorter object length

As far as I can tell there is nothing wrong with my data, and other methods (e.g. NMDS, DESeq2...) are working fine. A summary of my data would be:

class: TreeSummarizedExperiment 
dim: 2661 55 
metadata(0):
assays(1): counts
rownames(2661): Otu0002 Otu0005 ... Otu3034 Otu3035
rowData names(0):
colnames(55): www xxx ... yyy zzz
colData names(221): disease jjj ...
  kkk lll
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
rowLinks: NULL
rowTree: NULL
colLinks: NULL
colTree: NULL

This happens regardless of whether I input my data as a TreeSummarizedExperiment or a phyloseq object.

cyklee commented 7 months ago

I am encountering similar warning messages as well. I've traced it to the E-M algorithm.

npittoors commented 4 months ago

I am having the same issue with my data. Everything looks fine with my data and phyloseq object. However, I keep receiving the repeated message "Warning: longer object length is not a multiple of shorter object lengthWarning: longer object length is not a multiple of shorter object lengthWarning: longer object length is not a multiple of shorter object length". @fpusan did you find a solution?

fpusan commented 4 months ago

So far the solution has been going back to DESeq2

pbradz commented 3 months ago

Hi all, I recently started having this problem (or at least, something with the same error messages) after an R update. I'm attaching the dataset that I get these warnings on. Here is a minimal example using this dataset:

library(tidyverse)
library(ANCOMBC)
test_ds <- read_csv("test-dataset1.csv")
# convert to matrix
test_ds_mtx <- as.matrix(test_ds[,-1])
rownames(test_ds_mtx) <- test_ds[[1]]
# make metadata table
test_smp_md <- rbind(data.frame(sample=paste0("B", 1:50),
                                condition="g1ctrl"),
                     data.frame(sample=paste0("R", 1:50),
                                condition="g2case")) %>%
  (function(x) {
    y <- x[, -1, FALSE];
    rownames(y) <- x[, 1];
    y
  })
# convert to TSE
test_tse <- TreeSummarizedExperiment::TreeSummarizedExperiment(
  assays=S4Vectors::SimpleList(counts=test_ds_mtx),
  colData=test_smp_md)
# run ANCOMBC2
test_results <- ancombc2(test_tse,
                          assay_name="counts",
                          fix_formula="condition")

test-dataset1.csv

This yields:

Warning messages:
1: In r0i * beta/nu0 :
  longer object length is not a multiple of shorter object length
2: In r1i * (beta - l1)/(nu0 + kappa1) :
  longer object length is not a multiple of shorter object length
3: In r2i * (beta - l2)/(nu0 + kappa2) :
  longer object length is not a multiple of shorter object length
4: In r0i/nu0 : longer object length is not a multiple of shorter object length
5: In r1i/(nu0 + kappa1) :
  longer object length is not a multiple of shorter object length
6: In r2i/(nu0 + kappa2) :
  longer object length is not a multiple of shorter object length
7: In r1i * (beta - delta)/(nu0 + kappa1) :
  longer object length is not a multiple of shorter object length
8: In r1i/(nu0 + kappa1) :
  longer object length is not a multiple of shorter object length
9: In r2i * (beta - delta)/(nu0 + kappa2) :
  longer object length is not a multiple of shorter object length
10: In r2i/(nu0 + kappa2) :
  longer object length is not a multiple of shorter object length
[...]

Any ideas?

ajcardenasb commented 3 months ago

I am experiencing the same issue across different datasets. Any updates would be appreciated. Thank you!

npittoors commented 3 months ago

Since this is just a warning messege, I decided to move past it. You can see the results once the code stops running by running print('ancombc'$res) where 'ancombc' is whatever you called your output

pbradz commented 3 months ago

I stepped through the beginning of .bias_em and it looks like the problem is here:

beta = beta[!is.na(beta)]
nu0 = var_hat
nu0 = nu0[!is.na(nu0)]

In my test data, there are some betas that are NA where nu0 is not also NA -- so in my specific case, the length of beta is 469 and the length of nu0 is 470. This could potentially cause those vectors to be misaligned.

(R will go ahead and do the calculation anyway with the misaligned vectors, automatically "recycling" values from the shorter vector to pad it out to the same length, which is why it's a warning instead of an error. However, those misaligned and padded numbers aren't necessarily meaningful in this context.)

If I manually fix beta in the debugger to have the same length as nu0, or if I use a patched version of the function, I don't get those 50+ warnings:

.bias_em_patched <- function(beta, var_hat, tol, max_iter) {
  nu0 = var_hat
  neither_na = !(is.na(beta) | is.na(nu0))
  beta = beta[neither_na]
  nu0 = nu0[neither_na]
[...rest as is...]
rlang::env_unlock(env = asNamespace('ANCOMBC'))
rlang::env_binding_unlock(env = asNamespace('ANCOMBC'))
assign('.bias_em', .bias_em_patched, envir = asNamespace('ANCOMBC'))
rlang::env_binding_lock(env = asNamespace('ANCOMBC'))
rlang::env_lock(asNamespace('ANCOMBC'))

> test_results <- ancombc2(test_tse,
+                           assay_name="counts",
+                           fix_formula="condition",
+                          struc_zero = TRUE, s0_perc=0.05, pseudo_sens=TRUE,
+                          group="condition"
+                          )
Warning messages:
1: The group variable has < 3 categories 
The multi-group comparisons (global/pairwise/dunnet/trend) will be deactivated 
2: In sqrt(sweep(var_hat, 2, var_delta, "*")) : NaNs produced

As far as I can tell, the actual estimates of log-foldchange and standard error are unchanged, but the p-values and q-values are quite different (see attached), so I think this may need to be patched. (See attached; first 8 pages compare p- and q-values, either patched vs. non-patched or two runs of non-patched. Next 4 pages compare lfc and se estimates between patched and non-patched.) Putting together a pull request now.

ancombc-comparison.pdf

cladigar commented 2 months ago

I am also reporting the same issue, arising from a re-analysis of data that worked perfectly fine some months ago. The warnings have disappeared thanks to the solution provided by @pbradz , which I support after testing in my data and after having had a look at the comparison with the patched version of the function that he provided. Thanks!

ArdenEngel commented 3 weeks ago

I'm so excited to see that @pbradz seems to have patched this issue. I used the ancombc2() function to successfully perform differential abundance analyses on ITS sequence data last year with no warnings returned, but have received these same "longer object length is not a multiple of shorter object length" warnings when using this function to re-analyze those same data since the package was updated. I see the commit from @pbradz on GitHub with the proposed fix, but I am unsure how to incorporate the code therein into my analysis code in RMarkdown. I assume that the patch code cannot simply be pasted into a code chunk in RMarkdown prior to use of the ancombc2() function. I would massively appreciate any help/clarification! Thank you!

cladigar commented 3 weeks ago

My approach was to reinstall the package from the bugfix branch that @pbradz created. It seems that @Maggie8888 is working on some updates to the package, so we might have some news soon!

FrederickHuangLin commented 2 weeks ago

Thank you all for reporting the issue and for your collaborative effort in debugging.

My apologies for the delayed response on the GitHub repository. I started a new position late last year and have since had limited support for maintaining the software. I’m currently working to secure more resources and am collaborating with @Maggie8888 to deliver a comprehensive bug fix and functionality improvements later this year.

A special thanks to @pbradz for putting together a pull request. I’ll be merging it into the bug-fix branch soon.

Stay tuned for the upcoming update!

Best regards,