Big variation of the 'loglik' after one additional VE step

scj-robin commented 2 years ago

The ELBO (named 'loglik' in the output of the PLN function) seem to vary dramatically when performing one additional VE step, while convergence is supposed to be reached.

Considering the Barents data with no offset, Y = counts (n=59, p=30)and X = intercept + covariates (d=5), we get:

fit <- PLN(Y ~ -1+X, control=list(trace=0))
VE <- fit$VEstep(covariates=X, offsets=matrix(0, n, p), responses=Y, weights = rep(1,n))
print(c(fit$loglik, sum(VE$log.lik)))
> [1] -8989.666 -6655.017

Using a homemade function to calculate the ELBO:

ELBO <- function(Y, X, M, S2, Theta, Sigma){
   n <- nrow(Y); p <- ncol(Y); Omega <- solve(Sigma); Beta <- t(Theta)
   EqLogPZ <- -n/2 * log(det(Sigma)) -0.5 * sum(diag(M %*% Omega %*% t(M))) - 
     0.5 * sum(sapply(1:n, function(i){diag(Omega%*%diag(S2[i, ]))}))
   EqLogPY.Z <- -sum(exp(X%*%Beta + M + S2/2)) + sum(Y * (X %*% Beta + M)) - sum(lgamma(Y+1))
   EqLogQZ <- -0.5 * sum(log(S2)) -n*p/2
   return(list(EqLogPZ=EqLogPZ, EqLogPY.Z=EqLogPY.Z, EqLogQZ=EqLogQZ, 
               elbo=EqLogPZ + EqLogPY.Z - EqLogQZ))
 }

we get different but similar results:

print(c(ELBO(Y, X, fit$var_par$M, fit$var_par$S2, fit$model_par$Theta, fit$model_par$Sigma)$elbo, 
        ELBO(Y, X, VE$M, VE$S2, fit$model_par$Theta, fit$model_par$Sigma)$elbo))
> [1] -8989.132 -5630.619

It seems that the variational parameters M and S2 vary quite a lot during the additional VE step:

print(c(max(abs(fit$var_par$M - VE$M)), max(abs(fit$var_par$S2 - VE$S2))))
> [1] 1.964532 1.083126

mahendra-mariadassou commented 2 years ago

2 remarks:

There was indeed a bug when computing the log-likelihood at the end of VEstep() (a S which should have been a S2)
I'm completely puzzled by the lack of convergence. We can even reproduce the problem by calling optimize() twice (once implicitly during the first call to PLN and then once explicitly)

library(PLNmodels)
load("../Data/BarentsFish/BarentsFish.Rdata")
n <- nrow(Data$count)
p <- ncol(Data$count)
X <- Data$covariates
Y <- Data$count
O <- matrix(0, n, p)
w <- rep(1, n)
fit <- PLN(count ~ -1 + covariates, control=list(trace=1), data = Data)
fit_new <- fit$clone(deep = TRUE)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(NULL, n, p))
print(c(fit$loglik, fit_new$loglik))
[1] -9035.610 -8755.656

And produce an even greater jump by manually updating the variational parameters (with VEstep()) before calling optimize()

VE <- fit$VEstep(covariates= X, offsets= O, responses= Y, weights = w)
fit_new$update(M = VE$M, S2 = VE$S2)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(NULL, n, p))
print(c(fit$loglik, fit_new$loglik))
[1] -9035.61 -5343.67

And repeat the process

VE <- fit_new$VEstep(covariates= X, offsets= O, responses= Y, weights = w)
fit_new$update(M = VE$M, S2 = VE$S2)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(NULL, n, p))
print(c(fit$loglik, fit_new$loglik))
[1] -9035.610 -5077.922

This looks (at least for the first block) like a problem with the optimizer although for all model we have

fit$optim_par$message
[1] "xtol_rel or xtol_abs was reached"
fit_new$optim_par$message
[1] "xtol_rel or xtol_abs was reached"

mahendra-mariadassou commented 2 years ago

After a bit of investigation, it seems like convergence is incomplete with the default parameters. If I change xtol_rel to 1e-6 (instead of the default 1e-4)

convergence is achieved because the ELBO doesn't increase anymore ("ftol_rel or ftol_abs was reached") although the parameters may still change slightly
the ELBO is much better (-4400) using the smaller xtol_rel than the default one (-9035.610)

library(PLNmodels)
#> This is packages 'PLNmodels' version 0.11.6-9200
#> Use future::plan(multicore/multisession) to speed up PLNPCA/PLNmixture/stability_selection.
load("~/Documents/PLN/Data/BarentsFish/BarentsFish.Rdata")
n <- nrow(Data$count)
p <- ncol(Data$count)
X <- Data$covariates
Y <- Data$count
O <- matrix(0, n, p)
w <- rep(1, n)
fit <- PLN(count ~ -1 + covariates, data = Data, control = list(xtol_rel = 1e-6, trace = 1))
#> 
#>  Initialization...
#>  Adjusting a PLN model with full covariance model
#>  Post-treatments...
#>  DONE!
fit$optim_par
#> $iterations
#> [1] 3533
#> 
#> $message
#> [1] "ftol_rel or ftol_abs was reached"

Calling optimize() once again doesn't drastically affect the value of the ELBO and the values of the parameters (model and variational) don't change ("xtol_rel or xtol_abs was reached")

fit_new <- fit$clone(deep = TRUE)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(list(xtol_rel = 1e-6), n, p))
fit_new$optim_par
#> $iterations
#> [1] 11
#> 
#> $message
#> [1] "xtol_rel or xtol_abs was reached"
print(c(fit$loglik, fit_new$loglik))
#> [1] -4401.694 -4401.691

Doing one VE step and calling optimize() with the model parameters from fit and the variational parameters from VE doesn't affect the results ("xtol_rel or xtol_abs was reached")

VE <- fit$VEstep(covariates= X, offsets= O, responses= Y, weights = w, control = PLNmodels:::PLN_param(list(xtol_rel = 1e-6), n, p))
fit_new$update(Theta = fit$model_par$Theta, Sigma = fit$model_par$Sigma, M = VE$M, S2 = VE$S2)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(list(xtol_rel = 1e-6), n, p))
fit_new$optim_par
#> $iterations
#> [1] 11
#> 
#> $message
#> [1] "xtol_rel or xtol_abs was reached"
print(c(fit$loglik, fit_new$loglik))
#> [1] -4401.694 -4401.673

We can check visually that the parameters are qualitatively comparable in fit and fit_new.

par(mfrow = c(2, 2))
plot(fit$var_par$M ~ fit_new$var_par$M);abline(a = 0,b = 1, col = "red")
plot(fit$var_par$S2 ~ fit_new$var_par$S2);abline(a = 0,b = 1, col = "red")
plot(fit$model_par$Theta ~ fit_new$model_par$Theta);abline(a = 0,b = 1, col = "red")
par(mfrow = c(1, 1))

In summary, the main problem with the code appears to be the default value of xtol_rel which leads to faster but incomplete optimization (+ the typo that will be fixed)

^{Created on 2022-08-09 by the reprex package (v2.0.1)}

scj-robin commented 2 years ago

Thank you for your superfast answer. I actually do not get the same numerical value when running your code on my laptop (I wonder if some difference may exist between our respective datafiles). In conclusion, do you suggest that, when mostly interested in the value of the elbo and/or the variational parameters, we always run the code with the control option xtol_rel = 1e-6, or even less?

mahendra-mariadassou commented 2 years ago

The differences are probably due to my computer having a slightly different version of R/C++ than yours. The question about the elbo is a tricky one ! I don't have general suggestions apart from checking convergence like you did:

if I call optimize() again, does it change the likelihood ?
if I do a VEstep, does it change the likelihood ?
was convergence achieved because the parameters don't change too much or because the objective reached a plateau ? Setting a lower value for xtol_rel = 1e-6 is good in principle but will result in longer computation times and may not always be the best trade off between time and accuracy.

scj-robin commented 2 years ago

Using homemade functions for both the VE and the M step, it seems that the biggest changes occur in S2. Could the problem be related to some boundary imposed to S2 in the VEM algorithm?

jchiquet commented 2 years ago

Hi,

Thank Stéphane for raising this issue and to Mahendra for his responsiveness.

Some comments afterwards:

When we set the parameters of the optimization methods in PLNmodels, we did our best to reach a good compromise between good optimization and performance. We have checked on a number of problems that the optimum was reached. This is obviously not the case for barent fish data. I am not too much in favor of lowering the tolerance threshold of the convergence of parameters (xtol). We use the default of the NLOpt optimization library and I think it's fine like that.

On the other hand, I'm considering integrating Bastien's code as an alternative solution for the optimization, which uses the Rprop optimizer from torch, now well interfaced with R... It works very well, but it probably means more effort on the user's side to check that the model has been properly adjusted.

Finally, concerning the remark about the optimization in S²: in fact, we optimize in S such that S*S = S² by not putting any constraint on S, so the problem pointed out by Stéphane is not directly linked to the big variations of the ELBO, but maybe indirectly.

jchiquet commented 2 years ago

Ok, just to let you know that, after a couple of tests,

we now set xtol_rel = 1e-6 by default
I started the implementation of an alternative backend for optimization using torch + rprop, which seems to work for PLN and its variants (diagonal, spherical and full matrix), to be continued...

PLN-team / PLNmodels

Big variation of the 'loglik' after one additional VE step #91