Closed evaesquinas closed 2 years ago
Can you send a reproducible example of the first issue
From: evaesquinas @.>
Sent: Monday, August 8, 2022 7:01:56 AM
To: GabrielHoffman/variancePartition @.>
Cc: Subscribed @.***>
Subject: [GabrielHoffman/variancePartition] Error in .rowNamesDF<-
(x, value = value) : invalid 'row.names' length (Issue #64)
USE CAUTION: External Message.
Dear Dr. Hoffman,
I am trying to run variancePartition and fitExtractVarPartModel() to get the fraction of variance explained by each variable added in the formula. In addition, I have previously adjusted my data by 1 variable using removeBatchEffect from limma. In order to see the fraction of variance by this variable after the adjustment, I run fitExtractVarPartModel... but I got this error:
Error in .rowNamesDF<-
(x, value = value) : invalid 'row.names' length
Calls: fitExtractVarPartModel ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
In addition: Warning message:
In matrix(unlist(varPart), nrow = length(varPart), byrow = TRUE) :
data length [3828] is not a sub-multiple or multiple of the number of rows [1915]
Execution halted
If I run the function again with this variable + another variable, I get the same error but not the warning message.
Error in .rowNamesDF<-
(x, value = value) : invalid 'row.names' length
Calls: fitExtractVarPartModel ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
Execution halted
On the other hand, if I run fitExtractVarPartModel with a DIFFERENT VARIABLE (and NOT adding the variable that I adjusted for), the run works and I don't get any error.
Note 1: The columns (samples) from exprObj (log2(normalized counts + 1)) follows the same order as you can find in the column where the samples are in data. The data has the samples as rownames and exprObj is a matrix.
Note 2: I have tried to run variancePartitition after having adjusted by the same variable using ComBat-Seq and I have run the same models and it works. I don't get any errors/warnings.
Do you know what could be the reason of the error?
Any help would be very appreciated.
Thanks very much in advance
My R session:
R version 4.1.2 (2021-11-01) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.2 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base
other attached packages: [1] tidyr_1.2.0 biomaRt_2.48.3 targetidwp4_0.1.0 [4] dplyr_1.0.8 variancePartition_1.22.0 Biobase_2.52.0 [7] BiocGenerics_0.38.0 scales_1.1.1 BiocParallel_1.26.0 [10] limma_3.48.3 ggplot2_3.3.5
loaded via a namespace (and not attached): [1] nlme_3.1-152 bitops_1.0-7 matrixStats_0.59.0 [4] pbkrtest_0.5.1 bit64_4.0.5 filelock_1.0.2 [7] doParallel_1.0.17 progress_1.2.2 httr_1.4.2 [10] GenomeInfoDb_1.28.0 tools_4.1.2 backports_1.4.1 [13] utf8_1.2.2 R6_2.5.1 KernSmooth_2.23-20 [16] DBI_1.1.2 colorspace_2.0-2 withr_2.4.3 [19] tidyselect_1.1.1 prettyunits_1.1.1 curl_4.3.2 [22] bit_4.0.4 compiler_4.1.2 cli_3.1.1 [25] xml2_1.3.3 labeling_0.4.2 caTools_1.18.2 [28] readr_2.1.2 rappdirs_0.3.3 stringr_1.4.0 [31] digest_0.6.29 minqa_1.2.4 colorRamps_2.3 [34] XVector_0.32.0 pkgconfig_2.0.3 lme4_1.1-28 [37] dbplyr_2.1.1 fastmap_1.1.0 rlang_1.0.1 [40] RSQLite_2.2.7 farver_2.1.0 generics_0.1.2 [43] gtools_3.9.2 RCurl_1.98-1.3 magrittr_2.0.2 [46] GenomeInfoDbData_1.2.6 Matrix_1.3-2 Rcpp_1.0.8 [49] munsell_0.5.0 S4Vectors_0.30.0 fansi_1.0.2 [52] lifecycle_1.0.1 stringi_1.7.6 MASS_7.3-54 [55] zlibbioc_1.38.0 gplots_3.1.1 plyr_1.8.6 [58] BiocFileCache_2.0.0 grid_4.1.2 blob_1.2.2 [61] crayon_1.4.2 lattice_0.20-44 Biostrings_2.60.1 [64] splines_4.1.2 hms_1.1.1 KEGGREST_1.32.0 [67] pillar_1.7.0 boot_1.3-28 reshape2_1.4.4 [70] codetools_0.2-18 stats4_4.1.2 XML_3.99-0.6 [73] glue_1.6.1 vctrs_0.3.8 png_0.1-7 [76] nloptr_2.0.0 tzdb_0.2.0 foreach_1.5.2 [79] gtable_0.3.0 purrr_0.3.4 assertthat_0.2.1 [82] cachem_1.0.6 broom_0.7.12 tibble_3.1.6 [85] iterators_1.0.14 AnnotationDbi_1.54.1 memoise_2.0.0 [88] IRanges_2.26.0 ellipsis_0.3.2
— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GabrielHoffman_variancePartition_issues_64&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-ALLUrcfR-4CCQkZVKC8w3o&r=KdYcmw5SdXylMrTGSuNVkNJulowod64k0PTDC5BHZkk&m=L1LgY5AaIlEMttYRgvt4_pXoERNruA0PYbVDcsRJ5MKUQ23f5S-Anbdqfab2TrvL&s=_ojU5X_AfWOILOP0UpS-gF7t4i3cw8piQx1tYcIMlXU&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADFAXTOW4XCXJBBJRPCSSVLVYDSKJANCNFSM554WESUQ&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-ALLUrcfR-4CCQkZVKC8w3o&r=KdYcmw5SdXylMrTGSuNVkNJulowod64k0PTDC5BHZkk&m=L1LgY5AaIlEMttYRgvt4_pXoERNruA0PYbVDcsRJ5MKUQ23f5S-Anbdqfab2TrvL&s=Vu_grmyGaL1xbzLajXJqTPoyqoA6vy4sfU5TxPeWokk&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>
First of all, thanks very much for your reply. Due I am working with sensitive data, I have produced a similar example that gives me the error. Note that I tried to reduce the example as much as I could to get the error and to avoid pasting a huge example. Find attached the data, since it is quite big and I cannot upload it as a comment.
library(variancePartition)
library(dplyr)
library(tidyr)
cts <- read.csv("./cts.txt", sep="")
pheno <- read.csv("./pheno.txt", sep="")
categorical_variables <- c("Sample", "Var1", "Var2")
pheno[,categorical_variables] <- lapply(pheno[,categorical_variables],as.factor)
str(pheno)
cts.mat <- as.matrix(cts)
set.seed(11344)
model1 = ~ (1|Var1)
varPart <- fitExtractVarPartModel(cts.mat, model1, pheno)
set.seed(NULL)
It works without any problem.
set.seed(11344)
model2 = ~ (1|Var2)
varPart <- fitExtractVarPartModel(cts.mat, model2, pheno)
set.seed(NULL)
I get this error:
Dividing work into 100 chunks...
Total:4 s
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
Calls: fitExtractVarPartModel ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
In addition: Warning message:
In matrix(unlist(varPart), nrow = length(varPart), byrow = TRUE) :
data length [30] is not a sub-multiple or multiple of the number of rows [16]
Execution halted
model3 = ~ (1|Var1) + (1|Var2)
set.seed(11344)
varPart <- fitExtractVarPartModel(cts.mat, model3, pheno)
set.seed(NULL)
I get this error:
Error in if (inherits(possibleError, "error") && grep("the fixed-effects model matrix is column rank deficient", :
missing value where TRUE/FALSE needed
Calls: fitExtractVarPartModel -> fitExtractVarPartModel -> .fitExtractVarPartModel
Execution halted
Note that those errors were obtained using slurm (I just launched my R script in the cluster that I am using, writing this piece of code at the beginning of my R script #!/usr/bin/Rscript --vanilla
).
If I run each model in my personal computer (same version of the package) I get slightly different errors.
Dividing work into 100 chunks...
Total:4 s
There were 50 or more warnings (use warnings() to see the first 50)
> set.seed(NULL)
> warnings()
Warning messages:
1: In serialize(data, node$con, xdr = FALSE) :
'package:stats' may not be available when loading
2: In serialize(data, node$con, xdr = FALSE) :
'package:stats' may not be available when loading
(the rest of the warnings are the same)
In matrix(unlist(varPart), nrow = length(varPart), byrow = TRUE) : data length [30] is not a sub-multiple or multiple of the number of rows [16]
Dividing work into 100 chunks...
Total:1 s
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
In addition: Warning messages:
1: In serialize(data, node$con, xdr = FALSE) :
'package:stats' may not be available when loading
(9 more warnings as the first one)
Dividing work into 100 chunks...
Total:1 s
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
In addition: There were 12 warnings (use warnings() to see them)
> set.seed(NULL)
> warnings()
Warning messages:
1: In optwrap(optimizer, devfun, getStart(start, rho$pp), ... :
convergence code -4 from nloptwrap: NLOPT_ROUNDOFF_LIMITED: Roundoff errors led to a breakdown of the optimization algorithm. In this case, the returned minimum may still be useful. (e.g. this error occurs in NEWUOA if one tries to achieve a tolerance too close to machine precision.)
2: In optwrap(optimizer, devfun, getStart(start, rho$pp), ... :
convergence code -4 from nloptwrap: NLOPT_ROUNDOFF_LIMITED: Roundoff errors led to a breakdown of the optimization algorithm. In this case, the returned minimum may still be useful. (e.g. this error occurs in NEWUOA if one tries to achieve a tolerance too close to machine precision.)
3: In serialize(data, node$con, xdr = FALSE) :
'package:stats' may not be available when loading
(remaining warnings are the same as the 3rd one)
Note that I use seeds just in case to have the same outputs (although it doesn't usually help and I am getting different errors...)
I hope this can help you a bit more to understand my problem.
Thanks very much in advance
First, some errors are fixed others give more informative messages in the latest version v1.27.3 (https://diseaseneurogenomics.github.io/variancePartition/). I have made substantial fixes since v.1.22.0.
Second, lets look at model 2, but focus on the 5th gene:
set.seed(11344)
model2 = ~ (1|Var2)
varPart <- fitExtractVarPartModel(cts.mat[5,,drop=FALSE], model2, pheno)
# Warning messages:
# 1: In optwrap(optimizer, devfun, getStart(start, rho$pp), lower = rho$lower, :
# convergence code -4 from nloptwrap: NLOPT_ROUNDOFF_LIMITED: Roundoff errors led
# to a breakdown of the optimization algorithm. In this case, the returned minimum
# may still be useful. (e.g. this error occurs in NEWUOA if one tries to achieve
# a tolerance too close to machine precision.)
This is more an issue with the data than with the software. Looking at the expression of gene 5 versus Var2
:
df = data.frame(Var2 = pheno$Var2, expr = cts.mat[5,])
df[order(df$Var2),]
We see that there is no variation within category. This means that there is no residual variance, and the model fit fails.
> df[order(df$Var2),]
Var2 expr
S116 1 -1.981364e-03
S117 1 -1.981364e-03
S118 1 -1.981364e-03
S1 2 4.489186e-05
S2 2 4.489186e-05
S3 3 9.136233e-06
S4 3 9.136233e-06
S5 3 9.136233e-06
S6 3 9.136233e-06
S7 3 9.136233e-06
S8 3 9.136233e-06
S9 3 9.136233e-06
S13 3 9.136233e-06
S17 3 9.136233e-06
S18 3 9.136233e-06
S19 3 9.136233e-06
S20 3 9.136233e-06
S21 3 9.136233e-06
S22 3 9.136233e-06
S10 4 2.412923e-05
S11 4 2.412923e-05
S12 4 2.412923e-05
S14 4 2.412923e-05
S15 4 2.412923e-05
S16 4 2.412923e-05
S24 4 2.412923e-05
S25 4 2.412923e-05
S26 4 2.412923e-05
S28 5 9.884335e-06
S31 5 9.884335e-06
S32 5 9.884335e-06
S34 5 9.884335e-06
S36 5 9.884335e-06
S41 5 9.884335e-06
S42 5 9.884335e-06
S46 5 9.884335e-06
S47 5 9.884335e-06
S48 5 9.884335e-06
S51 5 9.884335e-06
S52 5 9.884335e-06
S54 5 9.884335e-06
S29 6 9.815644e-06
S30 6 9.815644e-06
S33 6 9.815644e-06
S35 6 9.815644e-06
S37 6 9.815644e-06
S38 6 9.815644e-06
S39 6 9.815644e-06
S40 6 9.815644e-06
S43 6 9.815644e-06
S44 6 9.815644e-06
S45 6 9.815644e-06
S49 7 -8.161759e-05
S50 7 -8.161759e-05
S56 7 -8.161759e-05
S57 7 -8.161759e-05
S58 7 -8.161759e-05
S59 7 -8.161759e-05
S60 7 -8.161759e-05
S61 7 -8.161759e-05
S62 7 -8.161759e-05
S63 7 -8.161759e-05
S64 7 -8.161759e-05
S71 7 -8.161759e-05
S55 8 2.274638e-05
S65 8 2.274638e-05
S66 8 2.274638e-05
S67 8 2.274638e-05
S68 8 2.274638e-05
S69 8 2.274638e-05
S70 8 2.274638e-05
S74 8 2.274638e-05
S23 9 9.736041e-06
S27 9 9.736041e-06
S53 9 9.736041e-06
S72 9 9.736041e-06
S73 9 9.736041e-06
S75 9 9.736041e-06
S76 9 9.736041e-06
S77 9 9.736041e-06
S78 9 9.736041e-06
S80 9 9.736041e-06
S83 9 9.736041e-06
S85 9 9.736041e-06
S93 9 9.736041e-06
S97 9 9.736041e-06
S102 9 9.736041e-06
S79 10 4.058674e-05
S81 10 4.058674e-05
S82 10 4.058674e-05
S86 10 4.058674e-05
S89 10 4.058674e-05
S91 10 4.058674e-05
S92 10 4.058674e-05
S94 10 4.058674e-05
S95 10 4.058674e-05
S96 10 4.058674e-05
S98 10 4.058674e-05
S101 10 4.058674e-05
S84 11 2.747669e-05
S87 11 2.747669e-05
S88 11 2.747669e-05
S90 11 2.747669e-05
S99 11 2.747669e-05
S100 11 2.747669e-05
S103 12 2.413696e-05
S104 12 2.413696e-05
S105 12 2.413696e-05
S106 12 2.413696e-05
S107 12 2.413696e-05
S109 12 2.413696e-05
S110 12 2.413696e-05
S111 12 2.413696e-05
S113 12 2.413696e-05
S114 12 2.413696e-05
S115 12 2.413696e-05
S108 13 2.782057e-05
S112 13 2.782057e-05
In real data, this would be extremely unlikely since it means there is no statistical noise in the data. So a regression model is
Gabriel
Dear Dr. Hoffman,
Thanks very much for your help. I updated to v.1.26 (the stable release in Bioconductor) and the errors are quite different.
In fact, when I run model 1, it appears something that I haven't seen: Error in eval_f(x, ...): Downdated VtV is not positive definite
as well as the warning that you posted.
In fact, I have seen here that the origin of this error is from glmer
, where there is complete separation in the model.
I would need to ask you some questions more if you don't mind:
Q1: In relation to the following warning.. If the model doesn't fail (using all the genes, not taking only one as you did) but I get this warning, the conclusion will be the same, right? That there is no variation within category. Or should I understand it in another way?
# Warning messages:
# 1: In optwrap(optimizer, devfun, getStart(start, rho$pp), lower = rho$lower, :
# convergence code -4 from nloptwrap: NLOPT_ROUNDOFF_LIMITED: Roundoff errors led
# to a breakdown of the optimization algorithm. In this case, the returned minimum
# may still be useful. (e.g. this error occurs in NEWUOA if one tries to achieve
# a tolerance too close to machine precision.)
Q2: What I don't understand is why the model fails when it has all the genes and this doesn't happen (although it gives you a warning) when you have 1 gene (as you tried). How is this possible? Is it because when it finds a gene without any variance, the model stops and it gives you the error?
Q3: Just out of curiosity, how did you choose gene 5? (because if I choose gene 4, gene 6... I don't get the same warning and therefore, there is variance within category). The reason of my question is that I am trying to find a similar case in my real data to see the same effect, but I don't know how to find a gene without variation within category.
Thanks in advance again
Q1: This tells you that at least one gene failed. But it unfortunately doesn't tell you which one
Q2: This failure for one gene causes a different error when it tries to assemble the results for multiple genes together. That's why you see the error about the number of rows.
This is not ideal, but in practice it almost never comes up. Further filtering your data should remove the problematic genes. With your simulated data, it means the simulations weren't very realistic because there was no random noise. I don't know about your real data, but it likely has to do with including genes that should be excluded for some simple reason
Q3: I did the analysis one gene at a time and found when it failed
for(i in 1:nrow(cts.mat)){
message(i)
varPart <- fitExtractVarPartModel(cts.mat[i,,drop=FALSE], model2, pheno, quiet=TRUE)
}
I bet that the same issue is going to affect most of your problematic genes.
Dear Dr. Hoffman,
Thank very much again for your reply and your patience answering me.
I tried the for loop to do the analysis one gene at a time and it failed in the 15th gene (not the 5th).
If I do the analysis with the 5th, the model doesn't fail but I get this warning
> varPart <- fitExtractVarPartModel(cts.mat[5,,drop=FALSE], model2, pheno)
Dividing work into 1 chunks...
Total:0.05 s
Warning messages:
1: In optwrap(optimizer, devfun, getStart(start, rho$pp), lower = rho$lower, :
convergence code -4 from nloptwrap: NLOPT_ROUNDOFF_LIMITED: Roundoff errors led to a breakdown of the optimization algorithm. In this case, the returned minimum may still be useful. (e.g. this error occurs in NEWUOA if one tries to achieve a tolerance too close to machine precision.)
However, with the 15th gene, I get the following error.
varPart2 <- fitExtractVarPartModel(cts.mat[15,,drop=FALSE], model2, pheno)
Error in if (inherits(possibleError, "error") && grep("the fixed-effects model matrix is column rank deficient", :
missing value where TRUE/FALSE needed
Q1: What does it mean (or what is the difference comparing to the previous warning)? I saw what you answered about the error here but in this case, all the genes have expression in each category of Var2.
I am trying to understand what you replied before: We see that there is no variation within category. This means that there is no residual variance, and the model fit fails.
I thought I understood it, but I have compared the df of gene 1, gene 5 and gene 15 and they follow the same pattern. I don't see any variation within category in any of the genes, but gene 1 didn't fail. Maybe I misunderstood what you said.
Q2: It may be obvious, but what do you exactly mean with "there is no variation within category"? Because I am checking the expression of each gene and they have the same expression in each level of Var2.
(E.g. gene 1)
Here I attach the results for gene 1, gene 5 and gene 15.
df = data.frame(Var2 = pheno$Var2, expr = cts.mat[5,])
df[order(df$Var2),]
Var2 expr
S116 1 -1.981364e-03
S117 1 -1.981364e-03
S118 1 -1.981364e-03
S1 2 4.489186e-05
S2 2 4.489186e-05
S3 3 9.136233e-06
S4 3 9.136233e-06
S5 3 9.136233e-06
S6 3 9.136233e-06
S7 3 9.136233e-06
S8 3 9.136233e-06
S9 3 9.136233e-06
S13 3 9.136233e-06
S17 3 9.136233e-06
S18 3 9.136233e-06
S19 3 9.136233e-06
S20 3 9.136233e-06
S21 3 9.136233e-06
S22 3 9.136233e-06
S10 4 2.412923e-05
S11 4 2.412923e-05
S12 4 2.412923e-05
S14 4 2.412923e-05
S15 4 2.412923e-05
S16 4 2.412923e-05
S24 4 2.412923e-05
S25 4 2.412923e-05
S26 4 2.412923e-05
S28 5 9.884335e-06
S31 5 9.884335e-06
S32 5 9.884335e-06
S34 5 9.884335e-06
S36 5 9.884335e-06
S41 5 9.884335e-06
S42 5 9.884335e-06
S46 5 9.884335e-06
S47 5 9.884335e-06
S48 5 9.884335e-06
S51 5 9.884335e-06
S52 5 9.884335e-06
S54 5 9.884335e-06
S29 6 9.815644e-06
S30 6 9.815644e-06
S33 6 9.815644e-06
S35 6 9.815644e-06
S37 6 9.815644e-06
S38 6 9.815644e-06
S39 6 9.815644e-06
S40 6 9.815644e-06
S43 6 9.815644e-06
S44 6 9.815644e-06
S45 6 9.815644e-06
S49 7 -8.161759e-05
S50 7 -8.161759e-05
S56 7 -8.161759e-05
S57 7 -8.161759e-05
S58 7 -8.161759e-05
S59 7 -8.161759e-05
S60 7 -8.161759e-05
S61 7 -8.161759e-05
S62 7 -8.161759e-05
S63 7 -8.161759e-05
S64 7 -8.161759e-05
S71 7 -8.161759e-05
S55 8 2.274638e-05
S65 8 2.274638e-05
S66 8 2.274638e-05
S67 8 2.274638e-05
S68 8 2.274638e-05
S69 8 2.274638e-05
S70 8 2.274638e-05
S74 8 2.274638e-05
S23 9 9.736041e-06
S27 9 9.736041e-06
S53 9 9.736041e-06
S72 9 9.736041e-06
S73 9 9.736041e-06
S75 9 9.736041e-06
S76 9 9.736041e-06
S77 9 9.736041e-06
S78 9 9.736041e-06
S80 9 9.736041e-06
S83 9 9.736041e-06
S85 9 9.736041e-06
S93 9 9.736041e-06
S97 9 9.736041e-06
S102 9 9.736041e-06
S79 10 4.058674e-05
S81 10 4.058674e-05
S82 10 4.058674e-05
S86 10 4.058674e-05
S89 10 4.058674e-05
S91 10 4.058674e-05
S92 10 4.058674e-05
S94 10 4.058674e-05
S95 10 4.058674e-05
S96 10 4.058674e-05
S98 10 4.058674e-05
S101 10 4.058674e-05
S84 11 2.747669e-05
S87 11 2.747669e-05
S88 11 2.747669e-05
S90 11 2.747669e-05
S99 11 2.747669e-05
S100 11 2.747669e-05
S103 12 2.413696e-05
S104 12 2.413696e-05
S105 12 2.413696e-05
S106 12 2.413696e-05
S107 12 2.413696e-05
S109 12 2.413696e-05
S110 12 2.413696e-05
S111 12 2.413696e-05
S113 12 2.413696e-05
S114 12 2.413696e-05
S115 12 2.413696e-05
S108 13 2.782057e-05
S112 13 2.782057e-05
df2 = data.frame(Var2 = pheno$Var2, expr = cts.mat[15,])
df2[order(df2$Var2),]
Var2 expr
S116 1 1.320086e-05
S117 1 1.320086e-05
S118 1 1.320086e-05
S1 2 3.198804e-05
S2 2 3.198804e-05
S3 3 2.656940e-05
S4 3 2.656940e-05
S5 3 2.656940e-05
S6 3 2.656940e-05
S7 3 2.656940e-05
S8 3 2.656940e-05
S9 3 2.656940e-05
S13 3 2.656940e-05
S17 3 2.656940e-05
S18 3 2.656940e-05
S19 3 2.656940e-05
S20 3 2.656940e-05
S21 3 2.656940e-05
S22 3 2.656940e-05
S10 4 2.049358e-05
S11 4 2.049358e-05
S12 4 2.049358e-05
S14 4 2.049358e-05
S15 4 2.049358e-05
S16 4 2.049358e-05
S24 4 2.049358e-05
S25 4 2.049358e-05
S26 4 2.049358e-05
S28 5 -3.622413e-05
S31 5 -3.622413e-05
S32 5 -3.622413e-05
S34 5 -3.622413e-05
S36 5 -3.622413e-05
S41 5 -3.622413e-05
S42 5 -3.622413e-05
S46 5 -3.622413e-05
S47 5 -3.622413e-05
S48 5 -3.622413e-05
S51 5 -3.622413e-05
S52 5 -3.622413e-05
S54 5 -3.622413e-05
S29 6 2.443886e-05
S30 6 2.443886e-05
S33 6 2.443886e-05
S35 6 2.443886e-05
S37 6 2.443886e-05
S38 6 2.443886e-05
S39 6 2.443886e-05
S40 6 2.443886e-05
S43 6 2.443886e-05
S44 6 2.443886e-05
S45 6 2.443886e-05
S49 7 1.521231e-05
S50 7 1.521231e-05
S56 7 1.521231e-05
S57 7 1.521231e-05
S58 7 1.521231e-05
S59 7 1.521231e-05
S60 7 1.521231e-05
S61 7 1.521231e-05
S62 7 1.521231e-05
S63 7 1.521231e-05
S64 7 1.521231e-05
S71 7 1.521231e-05
S55 8 -1.395212e-04
S65 8 -1.395212e-04
S66 8 -1.395212e-04
S67 8 -1.395212e-04
S68 8 -1.395212e-04
S69 8 -1.395212e-04
S70 8 -1.395212e-04
S74 8 -1.395212e-04
S23 9 2.100696e-05
S27 9 2.100696e-05
S53 9 2.100696e-05
S72 9 2.100696e-05
S73 9 2.100696e-05
S75 9 2.100696e-05
S76 9 2.100696e-05
S77 9 2.100696e-05
S78 9 2.100696e-05
S80 9 2.100696e-05
S83 9 2.100696e-05
S85 9 2.100696e-05
S93 9 2.100696e-05
S97 9 2.100696e-05
S102 9 2.100696e-05
S79 10 2.221810e-05
S81 10 2.221810e-05
S82 10 2.221810e-05
S86 10 2.221810e-05
S89 10 2.221810e-05
S91 10 2.221810e-05
S92 10 2.221810e-05
S94 10 2.221810e-05
S95 10 2.221810e-05
S96 10 2.221810e-05
S98 10 2.221810e-05
S101 10 2.221810e-05
S84 11 1.946738e-05
S87 11 1.946738e-05
S88 11 1.946738e-05
S90 11 1.946738e-05
S99 11 1.946738e-05
S100 11 1.946738e-05
S103 12 -9.118168e-06
S104 12 -9.118168e-06
S105 12 -9.118168e-06
S106 12 -9.118168e-06
S107 12 -9.118168e-06
S109 12 -9.118168e-06
S110 12 -9.118168e-06
S111 12 -9.118168e-06
S113 12 -9.118168e-06
S114 12 -9.118168e-06
S115 12 -9.118168e-06
S108 13 -1.636078e-05
S112 13 -1.636078e-05
df3 = data.frame(Var2 = pheno$Var2, expr = cts.mat[1,])
df3[order(df3$Var2),]
Var2 expr
S116 1 2.866586e-05
S117 1 2.866586e-05
S118 1 2.866586e-05
S1 2 -5.470136e-05
S2 2 -5.470136e-05
S3 3 -3.663836e-04
S4 3 -3.663836e-04
S5 3 -3.663836e-04
S6 3 -3.663836e-04
S7 3 -3.663836e-04
S8 3 -3.663836e-04
S9 3 -3.663836e-04
S13 3 -3.663836e-04
S17 3 -3.663836e-04
S18 3 -3.663836e-04
S19 3 -3.663836e-04
S20 3 -3.663836e-04
S21 3 -3.663836e-04
S22 3 -3.663836e-04
S10 4 -4.320430e-05
S11 4 -4.320430e-05
S12 4 -4.320430e-05
S14 4 -4.320430e-05
S15 4 -4.320430e-05
S16 4 -4.320430e-05
S24 4 -4.320430e-05
S25 4 -4.320430e-05
S26 4 -4.320430e-05
S28 5 -1.727127e-05
S31 5 -1.727127e-05
S32 5 -1.727127e-05
S34 5 -1.727127e-05
S36 5 -1.727127e-05
S41 5 -1.727127e-05
S42 5 -1.727127e-05
S46 5 -1.727127e-05
S47 5 -1.727127e-05
S48 5 -1.727127e-05
S51 5 -1.727127e-05
S52 5 -1.727127e-05
S54 5 -1.727127e-05
S29 6 -1.375190e-04
S30 6 -1.375190e-04
S33 6 -1.375190e-04
S35 6 -1.375190e-04
S37 6 -1.375190e-04
S38 6 -1.375190e-04
S39 6 -1.375190e-04
S40 6 -1.375190e-04
S43 6 -1.375190e-04
S44 6 -1.375190e-04
S45 6 -1.375190e-04
S49 7 -1.271191e-04
S50 7 -1.271191e-04
S56 7 -1.271191e-04
S57 7 -1.271191e-04
S58 7 -1.271191e-04
S59 7 -1.271191e-04
S60 7 -1.271191e-04
S61 7 -1.271191e-04
S62 7 -1.271191e-04
S63 7 -1.271191e-04
S64 7 -1.271191e-04
S71 7 -1.271191e-04
S55 8 -6.060821e-05
S65 8 -6.060821e-05
S66 8 -6.060821e-05
S67 8 -6.060821e-05
S68 8 -6.060821e-05
S69 8 -6.060821e-05
S70 8 -6.060821e-05
S74 8 -6.060821e-05
S23 9 -1.168303e-05
S27 9 -1.168303e-05
S53 9 -1.168303e-05
S72 9 -1.168303e-05
S73 9 -1.168303e-05
S75 9 -1.168303e-05
S76 9 -1.168303e-05
S77 9 -1.168303e-05
S78 9 -1.168303e-05
S80 9 -1.168303e-05
S83 9 -1.168303e-05
S85 9 -1.168303e-05
S93 9 -1.168303e-05
S97 9 -1.168303e-05
S102 9 -1.168303e-05
S79 10 2.549976e-05
S81 10 2.549976e-05
S82 10 2.549976e-05
S86 10 2.549976e-05
S89 10 2.549976e-05
S91 10 2.549976e-05
S92 10 2.549976e-05
S94 10 2.549976e-05
S95 10 2.549976e-05
S96 10 2.549976e-05
S98 10 2.549976e-05
S101 10 2.549976e-05
S84 11 -2.167725e-04
S87 11 -2.167725e-04
S88 11 -2.167725e-04
S90 11 -2.167725e-04
S99 11 -2.167725e-04
S100 11 -2.167725e-04
S103 12 -4.168729e-04
S104 12 -4.168729e-04
S105 12 -4.168729e-04
S106 12 -4.168729e-04
S107 12 -4.168729e-04
S109 12 -4.168729e-04
S110 12 -4.168729e-04
S111 12 -4.168729e-04
S113 12 -4.168729e-04
S114 12 -4.168729e-04
S115 12 -4.168729e-04
S108 13 8.109313e-05
S112 13 8.109313e-05
Maybe I am overthinking with this example since is not real, but I would like to understand all these errors in case I find them in real data and therefore, I will know what I should do next.
Thanks very much again and sorry for disturbing.
You will all most never see these errors in real data...
Q2: 'no variation within category'. For gene 1, all the expression values for category 1 are -1.981364e-03
, and all the values for category 2 are 4.489186e-05
. This means that there is no variance within category. This almost never comes up in real data because there is always statistical and measurement noise. This is just an issue with your simulations.
S116 1 -1.981364e-03
S117 1 -1.981364e-03
S118 1 -1.981364e-03
S1 2 4.489186e-05
S2 2 4.489186e-05
S3 3 9.136233e-06
S4 3 9.136233e-06
S5 3 9.136233e-06
Q1: dream()
and the underlying lmer()
functions use iterative numerical methods to estimate parameters. These methods can suffer from rounding errors when values are very small. In this case, there is no variation within categories, so the residual variance is zero, or at least a very, very small number. The exact value of the residual variance is different for each gene due to rounding. So a value of 0 vs 1e-16 can cause different internal checks to fail so you get different warnings or errors.
Overall, I'm not concerned since it is extremely rare to encounter these issue in real data. But I am looking into improving the error messages
Gabriel
Thanks very much for your help and your explanations, I really appreciate it. After all these doubts, I think I understand better what it is going on with my real data, thanks.
Dear Dr. Hoffman,
I am trying to run
variancePartition
andfitExtractVarPartModel()
to get the fraction of variance explained by each variable added in the formula. In addition, I have previously adjusted my data by 1 variable usingremoveBatchEffect
fromlimma
. In order to see the fraction of variance by this variable after the adjustment, I runfitExtractVarPartModel
... but I got this error:If I run the function again with this variable + another variable, I get the same error but not the warning message.
On the other hand, if I run
fitExtractVarPartModel
with a DIFFERENT VARIABLE (and NOT adding the variable that I adjusted for), the run works and I don't get any error.Note 1: The columns (samples) from
exprObj
(log2(normalized counts + 1)) follows the same order as you can find in the column where the samples are indata
. Thedata
has the samples as rownames andexprObj
is a matrix. Although I saw that the example that it appears in the vignette has negative values and I have confirmed that my data works when I have a different variable in my formula, it is important to say (just in case) that the adjusted normalized counts contain negative values for some genes.Note 2: I have tried to run
variancePartition
after having adjusted by the same variable usingComBat-Seq
and I have run the same models and it works. I don't get any errors/warnings.Do you know what could be the reason of the error?
Any help would be very appreciated.
Thanks very much in advance
My R session: