adw96 / DivNet

diversity estimation under ecological networks
83 stars 18 forks source link

default X is specified differently for phyloseq W and matrix W #61

Closed pooranis closed 3 years ago

pooranis commented 3 years ago

I noticed a discrepancy in the results of divnet when I use a phyloseq object vs just a plain matrix.

Using the data from your vignette:

library(phyloseq)
library(breakaway)
library(DivNet)
library(magrittr)

data(Lee)
lee_phylum <- Lee %>%
  tax_glom(taxrank = "Phylum") %>%
  subset_samples(temp != "<NA>")

So, lee_phylum is a phyloseq object. If we use that as input to divnet with X = NULL (default):

divnet_phyloseq <- divnet(W= lee_phylum, base = 1)
summary(divnet_phyloseq$shannon)
# A tibble: 14 x 7
   estimate   error lower upper sample_names name   model    
      <dbl>   <dbl> <dbl> <dbl> <chr>        <chr>  <chr>    
 1    1.82  0.0177  1.79  1.86  BW1          DivNet Aitchison
 2    1.08  0.0328  1.02  1.15  BW2          DivNet Aitchison
 3    1.12  0.0235  1.07  1.16  R10          DivNet Aitchison
 4    1.01  0.0145  0.978 1.04  R11          DivNet Aitchison
 5    0.435 0.0233  0.389 0.482 R11BF        DivNet Aitchison
 6    1.41  0.0171  1.38  1.44  R1A          DivNet Aitchison
 7    1.38  0.0142  1.35  1.41  R1B          DivNet Aitchison
 8    1.29  0.00983 1.27  1.31  R2           DivNet Aitchison
 9    1.29  0.00654 1.27  1.30  R3           DivNet Aitchison
10    1.37  0.0124  1.34  1.39  R4           DivNet Aitchison
11    1.32  0.00981 1.31  1.34  R5           DivNet Aitchison
12    1.43  0.00805 1.41  1.45  R6           DivNet Aitchison
13    1.43  0.0152  1.40  1.46  R8           DivNet Aitchison
14    1.11  0.0155  1.08  1.14  R9           DivNet Aitchison

However, if we use the same otu table as a matrix:

mat <- t(otu_table(lee_phylum)@.Data)
divnet_matrix <- divnet(W=mat, base = 1)
summary(divnet_matrix$shannon)
# A tibble: 14 x 7
   estimate  error lower upper sample_names name   model    
      <dbl>  <dbl> <dbl> <dbl> <chr>        <chr>  <chr>    
 1     1.07 0.0761 0.913  1.22 BW1          DivNet Aitchison
 2     1.07 0.0761 0.913  1.22 BW2          DivNet Aitchison
 3     1.07 0.0761 0.913  1.22 R10          DivNet Aitchison
 4     1.07 0.0761 0.913  1.22 R11          DivNet Aitchison
 5     1.07 0.0761 0.913  1.22 R11BF        DivNet Aitchison
 6     1.07 0.0761 0.913  1.22 R1A          DivNet Aitchison
 7     1.07 0.0761 0.913  1.22 R1B          DivNet Aitchison
 8     1.07 0.0761 0.913  1.22 R2           DivNet Aitchison
 9     1.07 0.0761 0.913  1.22 R3           DivNet Aitchison
10     1.07 0.0761 0.913  1.22 R4           DivNet Aitchison
11     1.07 0.0761 0.913  1.22 R5           DivNet Aitchison
12     1.07 0.0761 0.913  1.22 R6           DivNet Aitchison
13     1.07 0.0761 0.913  1.22 R8           DivNet Aitchison
14     1.07 0.0761 0.913  1.22 R9           DivNet Aitchison

^ This result appears to be incorrect, as all the estimates are the same (?). I traced the discrepancy back to how X is specified differently for a phyloseq object: https://github.com/adw96/DivNet/blob/master/R/divnet_main.R#L72 and how it is specified for a generic matrix: https://github.com/adw96/DivNet/blob/master/R/divnet_main.R#L91

If I use the phyloseq specification for X with the matrix, then I get the same result as with a phyloseq object:

xx <- row.names(mat)
X <- model.matrix(~xx)
divnet_matrix_X <- divnet(W=mat, X = X, base = 1)
summary(divnet_matrix_X$shannon)
# A tibble: 14 x 7
   estimate   error lower upper sample_names name   model    
      <dbl>   <dbl> <dbl> <dbl> <chr>        <chr>  <chr>    
 1    1.82  0.0170  1.79  1.86  BW1          DivNet Aitchison
 2    1.08  0.0466  0.988 1.17  BW2          DivNet Aitchison
 3    1.12  0.00693 1.10  1.13  R10          DivNet Aitchison
 4    1.01  0.0152  0.977 1.04  R11          DivNet Aitchison
 5    0.435 0.00909 0.417 0.453 R11BF        DivNet Aitchison
 6    1.41  0.0197  1.37  1.45  R1A          DivNet Aitchison
 7    1.38  0.0147  1.35  1.41  R1B          DivNet Aitchison
 8    1.29  0.00968 1.27  1.31  R2           DivNet Aitchison
 9    1.29  0.00927 1.27  1.31  R3           DivNet Aitchison
10    1.37  0.00703 1.36  1.38  R4           DivNet Aitchison
11    1.32  0.0103  1.30  1.35  R5           DivNet Aitchison
12    1.43  0.0205  1.39  1.47  R6           DivNet Aitchison
13    1.43  0.0129  1.40  1.45  R8           DivNet Aitchison
14    1.11  0.0202  1.07  1.15  R9           DivNet Aitchison

I assume, in practice, we would want to use the latter, and the current implementation is just a bug?

Thanks very much for your very useful research and R packages.

bryandmartin commented 3 years ago

Hi @pooranis,

Thank you for pointing out this issue and the detailed bug report. This has been fixed in 1dbc3c2!

Bryan