Some errors applying ALASCA to longitudinal (repeat measures) microbial counts

samasafish commented 2 years ago

Hi Anders!

This is an excellent package! And one of the few to accept repeat measures intelligently! (talking to you vegan) I commend you on making it so user friendly. I have a special use case (or hopefully not so special) where I want to model microbial community change in a time series. This data seem to fit with the general requirements: values = microbial abundance (~500 spp.), time = days (x5, including day 0), group = individual or biological replicate (x4). sub_group = technical replicates (3/replicate/day)

My experimental setup is 4 biological replicates (individuals) sampled across Days (0 baseline, 1, 3, 7, and 14). Each biological replicate has 3 technical replicates at each time point (pseudo-replication).

Question 1. I may need a slightly different model structure than your examples define since I have technical replicates. (4 individuals) (5 days) (3 tech. replicates) = 60 observations. I'm only interested in the change over time (fixed effect), not in each individuals contribution (individuals are my blocks) and I'd like to explicitly model the technical replicates (i.e, sub_group) nested in each individual (i.e., group) instead of averaging them outside the model. Should I set the random effect to be (group|sub_group)?

model.formula2 <- value ~ time*group + (group|sub_group) Some results: 16s_time_series_ALASCA_PC1 output from validate = F

Question 2. I ran my model and I get a usable object (awesome!) but when I validate it I get interesting differences between the methods. With bootstrap validation it runs fine. With permutations (see error below). With "loo" ... R crashes. In general, are these bootstraps (or optionally permutations) aware of the model formula? I know some analyses packages use the permute package which requires a call to how() to set blocks, plots etc. so that permutations are not "free" but constrained to only the independent blocks of your study design. How is this handled in ALASCA?

Some issues I found with my unorthodox dataset:

summary(mod$regr.model[[1]]) #returns NULL in every case Length Class Mode 0 NULL NULL

Also this warning message when I use permutations instead of bootstrap:

      PE.mod <- ALASCA(df_long, model.formula2, separateTimeAndGroup = T,  useRfast = T, forceEqualBaseline = T, validate = T, validateRegression = F, validationMethod = "permutation", nValRuns = 100)

====== ALASCA ======

0.0.0.106 (2022-01-22)

Will use linear mixed models! Using group for stratification. Scaling data... Calculating LMM coefficients... Finished calculating regression coefficients! Calculating predictions from regression models... Finished calculating predictions from regression models! Calculating effect matrix Finished calculating effect matrix! Running validation...

Run 1 of 1000 Error in prepareValidationRun(object) : object 'temp_object' not found

Thanks for your time Anders,

Sam

sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] ALASCA_0.0.0.105 data.table_1.14.2 ggvegan_0.1-0 pairwiseAdonis_0.4
[5] cluster_2.1.2 patchwork_1.1.1 MicrobiotaProcess_1.6.3 weathermetrics_1.2.2
[9] tidyquant_1.0.3 quantmod_0.4.18 TTR_0.24.3 PerformanceAnalytics_2.0.4 [13] xts_0.12.1 zoo_1.8-9 ggtext_0.1.1 lubridate_1.8.0
[17] wesanderson_0.3.6 viridis_0.6.2 viridisLite_0.4.0 Cairo_1.5-14
[21] cowplot_1.1.1 ggthemes_4.2.4 magrittr_2.0.1 reshape_0.8.8
[25] reshape2_1.4.4 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
[29] purrr_0.3.4 readr_2.1.1 tidyr_1.1.4 tibble_3.1.6
[33] tidyverse_1.3.1 DivNet_0.4.0 breakaway_4.7.6 DESeq2_1.34.0
[37] SummarizedExperiment_1.24.0 Biobase_2.54.0 MatrixGenerics_1.6.0 matrixStats_0.61.0
[41] GenomicRanges_1.46.1 GenomeInfoDb_1.30.0 IRanges_2.28.0 S4Vectors_0.32.3
[45] BiocGenerics_0.40.0 metagMisc_0.0.4 microbiome_1.16.0 ggplot2_3.3.5
[49] phyloseq_1.38.0 vegan_2.5-7 lattice_0.20-45 permute_0.9-5
[53] ANCOMBC_1.4.0 corrplot_0.92 pvclust_2.2-0 dendextend_1.15.2

andjar commented 2 years ago

Hi Sam,

Thank you for your kind words. It is exciting to hear that you are testing the package! As you may have seen, there is still some work going on, but I have a goal to publish version 1.0.0 in the beginning of March where things will be a bit more polished.

ALASCA offers two options for the LMMs; Rfast, which is incredibly fast but only accepts one random intercept; and lme4::lmer(), which is slower but much more flexible. So if you set useRfast = FALSE in ALASCA(), it will use lme4::lmer() which accepts nested groupings such as (1|ID/replicate). The notation (a|b) is also valid for lme4::lmer(), but will introduce a random slope rather than nested effects

I am sorry that the code didn't work. I have to look further into question 2; at the moment, we have focused on bootstrapping as it is most commonly used. The general validation strategy is similar for all methods; ALASCA will generate a sample, call a new ALASCA model with that sample, and repeat. For bootstrap it should resample the stratification groups with replacemant; for loo, it should delete a subset from each stratification group; for permutation it should swap ID's around, and optionally also time points (within individual). But the two latter need some more care before they are ready for general use

samasafish commented 2 years ago

Very good, thanks for the response Anders. Yes I probably want (1|ID/replicate) as you suggested if it doesn't overfit the model. Also useRfast= T explains why I was getting the same results with different models :)

I unfortunately updated the package and now I can't play with the data anymore. I'm getting a DT error (even with your Intro dataset example) found here: https://andjar.github.io/ALASCA/articles/ALASCA.html I'm sure its temporary as you're still actively coding this, so feel free to ignore the errors below. I'll be patient and follow any updates. Very excited to see where ALASCA goes.

Best,

Sam

first error:

df$time <- as.numeric(as.character(df$time)) res.simple <- ALASCA(df = df, formula = model.formula, doDebug = T)

====== ALASCA ======

0.0.0.108 (2022-01-29)

.. Has initialized the ALASCA model. Next step is to clean it and check input Will use Rfast! Using group for stratification. .... Making factors for time, group and variable .... Checking for missing information Scaling data... .... Group term in formula? TRUE .... Interaction term in formula? TRUE .... Identified the following covariates in addition to time and troup: 1 | partid .. sanitizeObject: 0.02086782 s .. Has cleaned the ALASCA model. Next step is building it Calculating LMM coefficients... Error in contrasts<-(`tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels In addition: Warning message: In sanitizeObject(object) :

!!! -> Oh dear, at least on of your rows is missing either time or group. I have removed it/them for now, but you should check if this is a problem...

had to code time as a factor to make the example run to this error:

df$time <- as.factor(df$time) # must be a factor to run model.formula <- value ~ time*group + (1|partid) res.simple <- ALASCA(df = df, formula = model.formula, doDebug = T) ====== ALASCA ======

0.0.0.108 (2022-01-29)

.. Has initialized the ALASCA model. Next step is to clean it and check input Will use Rfast! Using group for stratification. .... Making factors for time, group and variable .... Checking for missing information Scaling data... .... Group term in formula? TRUE .... Interaction term in formula? TRUE .... Identified the following covariates in addition to time and troup: 1 | partid ..* sanitizeObject: 0.02222037 s .. Has cleaned the ALASCA model. Next step is building it Calculating LMM coefficients...

runRegression: 0.452275 s Finished calculating regression coefficients! Calculating predictions from regression models... Finished calculating predictions from regression models!
removeCovars: 0.002360582 s
separateLMECoefficients: 0.001047134 s Calculating effect matrix Finished calculating effect matrix!
getEffectMatrix: 0.03888226 s
doPCA: 0.006612539 s
cleanPCA: 0.00381732 s Error in :=(PC, as.numeric(gsub("PC", "", PC))) : Check that is.data.table(DT) == TRUE. Otherwise, := and :=(...) are defined for use in j, once only and in particular ways. See help(":=").

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

andjar / ALASCA

Some errors applying ALASCA to longitudinal (repeat measures) microbial counts #3

first error:

had to code time as a factor to make the example run to this error: