weights must be finite positive values

ktroule commented 6 years ago

I'm running zinbwave to perform DEG with edgeR. For one of the comparison I get this error that won't let me perform a DEG analysis

BiocParallel::register(BiocParallel::MulticoreParam(workers = 4))

summ.exp_norm <- zinbwave(summ.exp, K=2, X="~Patient", epsilon=1000, normalizedValues=TRUE, residuals = TRUE) # For t-sne and downstream analysis

# DEG analysis
zinb <- zinbFit(summ.exp_norm, K=2, epsilon=1000)
zinb.weights <- computeObservationalWeights(zinb, assay(summ.exp_norm))

dge <- DGEList(assay(summ.exp_norm))
dge <- calcNormFactors(dge)

design <- model.matrix(~Cell.type, data = colData(summ.exp_norm))
dge$weights <- zinb.weights

dge <- estimateDisp(dge, design) 
Error in .compressWeights(y, weigths) : weights must be finite positive values

When I check dge$weights there are some weights that are 0, as far I've seen no condition seems to have for a given gene all counts with 0 counts.

Not sure what might be the source of this error.

Thanks for your help.

drisso commented 6 years ago

@koenvandenberge have you seen this before?

@ktroule unrelated, but it may save you a lot of time: it looks like you are running both zinbwave() and zinbFit(). This is not needed as zinbwave() calls zinbFit() internally. Also, the latest version of zinbwave() automatically computes the weights for you (stored in the weights assay) so there's no need for computeObservationalWeights() either.

ktroule commented 6 years ago

Yes, I know that latest version does it automatically, but I've not been able to install the latest version in my computer (I'm using R inside conda).

koenvandenberge commented 6 years ago

@ktroule this is probably because some weights are set exactly to zero in R (while in theory they should be some very very small number), and zero is not a positive number. Can you retry with e.g.

zinb.weights[zinb.weights<1e-15] <- 1e-15 dge$weights <- zinb.weights

This should not make any practical difference in the results, and hopefully that should resolve the issue.

ktroule commented 6 years ago

Ok. I'll modify the code so.

Thanks

koenvandenberge commented 6 years ago

@ktroule By the way, is there a specific reason why your design matrix is different in the ZINB-WaVE model as compared to the edgeR model? Typically, you would want these to be the same, unless you would expect that some variables only have an effect on the zero inflation or negative binomial component.

@drisso maybe we should implement this by default when returning the weights to prevent this in the future?

ktroule commented 6 years ago

Sorry It's just wrong, I'm testing. I've two cell types from 6 different patients. I want to plot the t-sne and perform DEG for the two cell types while adjusting for the patients.

summ.exp_norm <- zinbwave(summ.exp, K=2,  X="~Patient", epsilon=1000, normalizedValues=TRUE, residuals = TRUE) # For t-sne and downstream analysis

# DEG analysis
zinb.weights <- computeObservationalWeights(summ.exp_norm, assay(summ.exp_norm))

dge <- DGEList(assay(summ.exp_norm))
dge <- calcNormFactors(dge)

design <- model.matrix(~0+Cell.type+Patient, data = colData(summ.exp_norm))
zinb.weights[zinb.weights<1e-15] <- 1e-15
dge$weights <- zinb.weights

dge <- estimateDisp(dge, design) 

Then perform DEG with the right coefficient

I think now its ok

drisso commented 6 years ago

@koenvandenberge I think it makes sense to implement this in the function. I will do it.

drisso commented 6 years ago

Didn't mean to close it. I will close it when implemented.

drisso commented 6 years ago

Please, check the version of the package in the develop branch, which implements @koenvandenberge 's solution within the computeObservationalWeights function.

Please, let me know if it works.

koenvandenberge commented 6 years ago

Great, thanks @drisso !

@ktroule , for that analysis I would also add the cell type effect in the ZINB-WaVE model, i.e. the design would be "~0+Cell.type+Patient" as it is in your edgeR model.

ktroule commented 6 years ago

@koenvandenberge I managed to move to the latest ZINB-WaVE version, also added your solution and it works ok. I do not get why should add the full design formula to the zinbwave function.

At the example you provide, you do so:

fluidigm_cov <- zinbwave(fluidigm, K=2, X="~Coverage_Type", epsilon=1000)
design <- model.matrix(~Biological_Condition, data = colData(fluidigm))

You only add the batch effect for the plotting X="~Coverage_Type" (batch effect, my patients in my case). Then perform a DEG without taking into account the batch effect (which I think is not correct).

Thanks for your help.

koenvandenberge commented 6 years ago

@ktroule You make a good point that in the vignette we indeed do not have the same design matrix in the ZINB-WaVE model as in the edgeR model, and we probably should reconsider this. Thank you for bringing this up.

I was only talking about DE analysis in my previous comments. If you also use ZINB-WaVE for visualization, then it can indeed make sense to not include all observed variables to estimate the unobserved variables and visualize them in reduced dimension, as you are doing. You could for instance make a visualization with and one without the patient effect to see how big the between-patient variability is as compared to the variability between cell types, or whether all patients have sequenced cells from both cell types.

ktroule commented 6 years ago

My bad. In the first comment the model is wrong, there is a post below where the models is ok.

Thanks for your time.

drisso / zinbwave

weights must be finite positive values #33