Closed ktroule closed 6 years ago
@koenvandenberge have you seen this before?
@ktroule unrelated, but it may save you a lot of time: it looks like you are running both zinbwave() and zinbFit(). This is not needed as zinbwave() calls zinbFit() internally. Also, the latest version of zinbwave() automatically computes the weights for you (stored in the weights assay) so there's no need for computeObservationalWeights() either.
Yes, I know that latest version does it automatically, but I've not been able to install the latest version in my computer (I'm using R inside conda).
@ktroule this is probably because some weights are set exactly to zero in R (while in theory they should be some very very small number), and zero is not a positive number. Can you retry with e.g.
zinb.weights[zinb.weights<1e-15] <- 1e-15
dge$weights <- zinb.weights
This should not make any practical difference in the results, and hopefully that should resolve the issue.
Ok. I'll modify the code so.
Thanks
@ktroule By the way, is there a specific reason why your design matrix is different in the ZINB-WaVE model as compared to the edgeR model? Typically, you would want these to be the same, unless you would expect that some variables only have an effect on the zero inflation or negative binomial component.
@drisso maybe we should implement this by default when returning the weights to prevent this in the future?
Sorry It's just wrong, I'm testing. I've two cell types from 6 different patients. I want to plot the t-sne and perform DEG for the two cell types while adjusting for the patients.
summ.exp_norm <- zinbwave(summ.exp, K=2, X="~Patient", epsilon=1000, normalizedValues=TRUE, residuals = TRUE) # For t-sne and downstream analysis
# DEG analysis
zinb.weights <- computeObservationalWeights(summ.exp_norm, assay(summ.exp_norm))
dge <- DGEList(assay(summ.exp_norm))
dge <- calcNormFactors(dge)
design <- model.matrix(~0+Cell.type+Patient, data = colData(summ.exp_norm))
zinb.weights[zinb.weights<1e-15] <- 1e-15
dge$weights <- zinb.weights
dge <- estimateDisp(dge, design)
Then perform DEG with the right coefficient
I think now its ok
@koenvandenberge I think it makes sense to implement this in the function. I will do it.
Didn't mean to close it. I will close it when implemented.
Please, check the version of the package in the develop branch, which implements @koenvandenberge 's solution within the computeObservationalWeights
function.
Please, let me know if it works.
Great, thanks @drisso !
@ktroule , for that analysis I would also add the cell type effect in the ZINB-WaVE model, i.e. the design would be "~0+Cell.type+Patient" as it is in your edgeR model.
@koenvandenberge
I managed to move to the latest ZINB-WaVE version, also added your solution and it works ok.
I do not get why should add the full design formula to the zinbwave
function.
At the example you provide, you do so:
fluidigm_cov <- zinbwave(fluidigm, K=2, X="~Coverage_Type", epsilon=1000)
design <- model.matrix(~Biological_Condition, data = colData(fluidigm))
You only add the batch effect for the plotting X="~Coverage_Type"
(batch effect, my patients in my case). Then perform a DEG without taking into account the batch effect (which I think is not correct).
Thanks for your help.
@ktroule You make a good point that in the vignette we indeed do not have the same design matrix in the ZINB-WaVE model as in the edgeR model, and we probably should reconsider this. Thank you for bringing this up.
I was only talking about DE analysis in my previous comments. If you also use ZINB-WaVE for visualization, then it can indeed make sense to not include all observed variables to estimate the unobserved variables and visualize them in reduced dimension, as you are doing. You could for instance make a visualization with and one without the patient effect to see how big the between-patient variability is as compared to the variability between cell types, or whether all patients have sequenced cells from both cell types.
My bad. In the first comment the model is wrong, there is a post below where the models is ok.
Thanks for your time.
I'm running
zinbwave
to perform DEG withedgeR
. For one of the comparison I get this error that won't let me perform a DEG analysisWhen I check
dge$weights
there are some weights that are 0, as far I've seen no condition seems to have for a given gene all counts with 0 counts.Not sure what might be the source of this error.
Thanks for your help.