drisso / zinbwave

Clone of the Bioconductor repository for the zinbwave package, see https://bioconductor.org/packages/zinbwave
43 stars 10 forks source link

Decreasing log-likelihood with commondispersion = FALSE #54

Closed fedeago closed 4 years ago

fedeago commented 4 years ago

Hi, I encountered a problem using zinbwave. If I set commodispersion parameters equal to FALSE the log-likelihood will decrese during the optimization of the dispersion parameters. In this little reproducible example there is a little effect but in real case application this effect is much bigger.

Example:

I am going to reproduce a problem with zinbwave model. At first I create a dataset sampling from negative binomial.

res <- NULL
conte <- for(i in 1:200){
  res_temp <- rnbinom(600,mu = 10, size = rchisq(1,100))
  res <-rbind(res,res_temp)
}

In order to be close to my utilization experience I artificially create a batch effect by adding a poisson distributed noise to some observation.

first_batch <- matrix(rpois(40000,5), nrow = 200)
secon_batch <-matrix(rpois(40000,10), nrow = 200)

res[,c(201:400)] <- res[,c(201:400)]+first_batch
res[,c(401:600)] <- res[,c(401:600)]+secon_batch

conte <- res
batch <- append(rep("type_1",200),rep("type_2",200))
batch <- append(batch ,rep("type_3",200))
conte_sce <- SingleCellExperiment(assays = list(counts = conte))
conte_sce$batch<- batch

Now I use zinbwave:

res_commondisp <- zinbwave::zinbwave(conte_sce,
                X = "~batch",BPPARAM = SerialParam(), verbose = T)

Create model: ok Initialize parameters: ok Optimize parameters: Iteration 1 penalized log-likelihood = -444478.223994673 After dispersion optimization = -334822.451814143 user system elapsed 1.282 0.004 1.285 After right optimization = -333462.91717295 After orthogonalization = -333462.91717295 user system elapsed 2.398 0.019 2.418 After left optimization = -333443.084022314 After orthogonalization = -333443.084022314 Iteration 2 penalized log-likelihood = -333443.084022314 After dispersion optimization = -333431.229280787 user system elapsed 0.378 0.000 0.378 After right optimization = -333430.601113283 After orthogonalization = -333430.601113283 user system elapsed 0.759 0.056 0.815 After left optimization = -333430.564456441 After orthogonalization = -333430.564456441 Iteration 3 penalized log-likelihood = -333430.564456441 ok

res_tagwisedisp <- zinbwave::zinbwave(conte_sce,
                X = "~batch",BPPARAM = SerialParam(), verbose = T, commondispersion = F)

Create model: ok Initialize parameters: ok Optimize parameters: Iteration 1 penalized log-likelihood = -444478.223994673 After dispersion optimization = -334864.84438229 user system elapsed 1.224 0.003 1.227 After right optimization = -333512.064447135 After orthogonalization = -333512.064447135 user system elapsed 2.371 0.010 2.380 After left optimization = -333492.694979865 After orthogonalization = -333492.694979865 Iteration 2 penalized log-likelihood = -333399.257494682 After dispersion optimization = -333467.350788734 user system elapsed 0.319 0.000 0.319 After right optimization = -333466.727984708 After orthogonalization = -333466.727984708 user system elapsed 0.922 0.003 0.924 After left optimization = -333466.269073984 After orthogonalization = -333466.269073984 Iteration 3 penalized log-likelihood = -333387.305793593 ok

We can see that with the commondispersion option set to FALSE in the second iteration the loglikelihood after the dispersion optimization is higher than the previous likelihood.

drisso commented 4 years ago

Hi @fedeago,

while I was trying to understand what's going on I stumbled across this old issue, the discussion there could be relevant: https://github.com/statOmics/zinbwaveZinger/issues/2

In an effort to fix the above issue, I may have introduced a different error in the verbose mode. It might be possible that it's only a problem with the wrong value being computed in the verbose mode. I will try to investigate this more and get back to you.

Note that the actual penalized log-likelihood at each iteration increases as expected, it's only the intermediate values that decrease. Can you confirm that this is the case also in your real case and not just in this small example?

fedeago commented 4 years ago

Yes in the real case application only the intermediate values decrease.

drisso commented 4 years ago

@fedeago

I found the bug. I was using the wrong value of the dispersion parameter to compute the likelihood in the verbose computations. Thanks for catching this!

Note that this did not affect the results, but only the printed values in verbose mode.

Commit 7dc0cbc8911ba14ff9f7c8b25a5b44a4c954a73c fixes the problem. Please update to version 1.9.2 which is available via github and will be propagated to bioc-devel in the next couple of days.