Closed scj-robin closed 1 year ago
2 remarks:
VEstep()
(a S
which should have been a S2
)optimize()
twice (once implicitly during the first call to PLN
and then once explicitly)library(PLNmodels)
load("../Data/BarentsFish/BarentsFish.Rdata")
n <- nrow(Data$count)
p <- ncol(Data$count)
X <- Data$covariates
Y <- Data$count
O <- matrix(0, n, p)
w <- rep(1, n)
fit <- PLN(count ~ -1 + covariates, control=list(trace=1), data = Data)
fit_new <- fit$clone(deep = TRUE)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(NULL, n, p))
print(c(fit$loglik, fit_new$loglik))
[1] -9035.610 -8755.656
And produce an even greater jump by manually updating the variational parameters (with VEstep()
) before calling optimize()
VE <- fit$VEstep(covariates= X, offsets= O, responses= Y, weights = w)
fit_new$update(M = VE$M, S2 = VE$S2)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(NULL, n, p))
print(c(fit$loglik, fit_new$loglik))
[1] -9035.61 -5343.67
And repeat the process
VE <- fit_new$VEstep(covariates= X, offsets= O, responses= Y, weights = w)
fit_new$update(M = VE$M, S2 = VE$S2)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(NULL, n, p))
print(c(fit$loglik, fit_new$loglik))
[1] -9035.610 -5077.922
This looks (at least for the first block) like a problem with the optimizer although for all model we have
fit$optim_par$message
[1] "xtol_rel or xtol_abs was reached"
fit_new$optim_par$message
[1] "xtol_rel or xtol_abs was reached"
After a bit of investigation, it seems like convergence is incomplete with the default parameters. If I change xtol_rel
to 1e-6 (instead of the default 1e-4)
xtol_rel
than the default one (-9035.610)library(PLNmodels)
#> This is packages 'PLNmodels' version 0.11.6-9200
#> Use future::plan(multicore/multisession) to speed up PLNPCA/PLNmixture/stability_selection.
load("~/Documents/PLN/Data/BarentsFish/BarentsFish.Rdata")
n <- nrow(Data$count)
p <- ncol(Data$count)
X <- Data$covariates
Y <- Data$count
O <- matrix(0, n, p)
w <- rep(1, n)
fit <- PLN(count ~ -1 + covariates, data = Data, control = list(xtol_rel = 1e-6, trace = 1))
#>
#> Initialization...
#> Adjusting a PLN model with full covariance model
#> Post-treatments...
#> DONE!
fit$optim_par
#> $iterations
#> [1] 3533
#>
#> $message
#> [1] "ftol_rel or ftol_abs was reached"
optimize()
once again doesn't drastically affect the value of the ELBO and the values of the parameters (model and variational) don't change ("xtol_rel or xtol_abs was reached")fit_new <- fit$clone(deep = TRUE)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(list(xtol_rel = 1e-6), n, p))
fit_new$optim_par
#> $iterations
#> [1] 11
#>
#> $message
#> [1] "xtol_rel or xtol_abs was reached"
print(c(fit$loglik, fit_new$loglik))
#> [1] -4401.694 -4401.691
optimize()
with the model parameters from fit
and the variational parameters from VE
doesn't affect the results ("xtol_rel or xtol_abs was reached")VE <- fit$VEstep(covariates= X, offsets= O, responses= Y, weights = w, control = PLNmodels:::PLN_param(list(xtol_rel = 1e-6), n, p))
fit_new$update(Theta = fit$model_par$Theta, Sigma = fit$model_par$Sigma, M = VE$M, S2 = VE$S2)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(list(xtol_rel = 1e-6), n, p))
fit_new$optim_par
#> $iterations
#> [1] 11
#>
#> $message
#> [1] "xtol_rel or xtol_abs was reached"
print(c(fit$loglik, fit_new$loglik))
#> [1] -4401.694 -4401.673
We can check visually that the parameters are qualitatively comparable in fit
and fit_new
.
par(mfrow = c(2, 2))
plot(fit$var_par$M ~ fit_new$var_par$M);abline(a = 0,b = 1, col = "red")
plot(fit$var_par$S2 ~ fit_new$var_par$S2);abline(a = 0,b = 1, col = "red")
plot(fit$model_par$Theta ~ fit_new$model_par$Theta);abline(a = 0,b = 1, col = "red")
par(mfrow = c(1, 1))
In summary, the main problem with the code appears to be the default value of xtol_rel
which leads to faster but incomplete optimization (+ the typo that will be fixed)
Created on 2022-08-09 by the reprex package (v2.0.1)
Thank you for your superfast answer. I actually do not get the same numerical value when running your code on my laptop (I wonder if some difference may exist between our respective datafiles). In conclusion, do you suggest that, when mostly interested in the value of the elbo and/or the variational parameters, we always run the code with the control option xtol_rel = 1e-6, or even less?
The differences are probably due to my computer having a slightly different version of R/C++ than yours. The question about the elbo is a tricky one ! I don't have general suggestions apart from checking convergence like you did:
optimize()
again, does it change the likelihood ?xtol_rel = 1e-6
is good in principle but will result in longer computation times and may not always be the best trade off between time and accuracy.Using homemade functions for both the VE and the M step, it seems that the biggest changes occur in S2. Could the problem be related to some boundary imposed to S2 in the VEM algorithm?
Hi,
Thank Stéphane for raising this issue and to Mahendra for his responsiveness.
Some comments afterwards:
When we set the parameters of the optimization methods in PLNmodels, we did our best to reach a good compromise between good optimization and performance. We have checked on a number of problems that the optimum was reached. This is obviously not the case for barent fish data. I am not too much in favor of lowering the tolerance threshold of the convergence of parameters (xtol). We use the default of the NLOpt optimization library and I think it's fine like that.
On the other hand, I'm considering integrating Bastien's code as an alternative solution for the optimization, which uses the Rprop optimizer from torch, now well interfaced with R... It works very well, but it probably means more effort on the user's side to check that the model has been properly adjusted.
Finally, concerning the remark about the optimization in S²: in fact, we optimize in S such that S*S = S² by not putting any constraint on S, so the problem pointed out by Stéphane is not directly linked to the big variations of the ELBO, but maybe indirectly.
Ok, just to let you know that, after a couple of tests,
xtol_rel = 1e-6
by default
The ELBO (named 'loglik' in the output of the PLN function) seem to vary dramatically when performing one additional VE step, while convergence is supposed to be reached.
Considering the Barents data with no offset, Y = counts (n=59, p=30)and X = intercept + covariates (d=5), we get:
Using a homemade function to calculate the ELBO:
we get different but similar results:
It seems that the variational parameters M and S2 vary quite a lot during the additional VE step: