Different results using the cran and the dev version

scj-robin commented 4 years ago

I consider a PLNnetwork model with 10 different sets of covariates (line type and color in the plots). I use the same code to fit the models and compute de BIC criterion to chosoe the covariates and the penalty (lambda)? I got very differents results (see the attached plots) when using the cran version (-PLNcran) and the dev branch version (-PLNdev), the BIC obtained with the dev version being sometimes much lower.

BarentsFishPLNpca-net-PLNcran.pdf BarentsFishPLNpca-net-PLNdev.pdf

I copy the code below:

# Factor model (PCA) + PLNnet
# Dirty version

rm(list=ls()); 
library(PLNmodels)

# Dir & data
dataDir <- '../../Data/BBVA/'
dataName <- 'BarentsFish'
load(paste0(dataDir, dataName, '.Rdata'))
Y <- Data$count; X <- Data$covariates; O <- Data$offset
n <- nrow(Y); p <- ncol(Y); d <- ncol(X); Xall1 <- cbind(rep(1, n), X)
model <- 'Y ~ X + offset(log(O))' # pcanet chooses q = 0
model <- 'Y ~ offset(log(O))' # pcanet chooses q = 3
# model <- 'Y ~ 1' # pcanet chooses q = 2, 3 ?

# Import
version <- 'dev' # 'dev', 'cran'
load(paste0(dataName, 'PLNpca-net-PLN', version, '.Rdata'))
lambdaNb <- length(lambda); Qmax <- length(PLNpca$criteria$BIC)

# Comparing joint and marginal prediction
PLNall <- PLN(model)
# PLNall$criteria$loglik - PLNall$criteria$BIC; PLNall$nb_param * log(n)/2
# PLNmarg <- list(); par(mfrow=c(ceiling(sqrt(p)), round(sqrt(p))), cex=.3)
# for(j in 1:p){
#    PLNmarg[[j]] <- PLN(Y[, j, drop=FALSE] ~ X + offset(log(O[, j, drop=FALSE])))
#    plot(Xall1%*%as.vector(PLNall$model_par$Theta[j, ]), Xall1%*%as.vector(PLNmarg[[j]]$model_par$Theta), pch=20, 
#         main=paste(j, ':', round(mean(Y[, j]), 1)), xlab='joint', ylab='marg')
#    abline(0, 1)
#    }

# PLNpca
Qmax <- 10;
PLNpca <- PLNPCA(model, ranks=1:Qmax)
PLNpca$plot()

# # PLNpca null : correlation between components and covariates
# q <- which.max(PLNpca$criteria$BIC)
# image(1:(d+q), 1:(d+q), abs(cor(cbind(X, PLNpca$getBestModel()$scores))))
# abline(h=.5+d, v=.5+d, lwd=2, col=4)
# abs(cor(cbind(X, PLNpca$getBestModel()$scores)))[1:d, d+(1:d)]

# PLNnet
lambda <- exp(seq(-3, 4, length.out=50)); lambdaNb <- length(lambda);
PLNnet <- PLNnetwork(model, penalties=lambda)

# PLNpca.net
PLNpca.net <- list()
PLNpca.net[[1]] <- PLNnet
for(q in 1:Qmax){
   print(q)
   PLNpca.net[[1+q]] <- PLNnetwork(Y ~ X + PLNpca$getModel(q)$scores + offset(log(O)), penalties=lambda)
}

# Export
save(lambda, model, PLNall, PLNpca, PLNnet, PLNpca.net, file=paste0(dataName, 'PLNpca-net-PLN', version, '.Rdata'))

# Corrected BIC
pdf(paste0(dataName, 'PLNpca-net-PLN', version, '.pdf'))
par(mfrow=c(1, 1), pch=20); 
dfpca <- cumsum(c(0, (p-1):(p-Qmax)))
plot(lambda, PLNnet$criteria$BIC, type='l', pch=21, log='x', lwd=2 ,lty=2, 
     ylim=c(-8000, -5000))
BIC <- corBIC <- matrix(0, 1+Qmax, lambdaNb)
for(q in 0:Qmax){
   BIC[1+q, ] <- PLNpca.net[[1+q]]$criteria$BIC
   # points(lambda, BIC[1+q, ], type='l', pch=21, col=1+q, lwd=2, lty=2)
   corBIC[1+q, ] <- BIC[1+q, ] - dfpca[1+q]*log(n)/2
   points(lambda, corBIC[1+q, ], type='l', pch=20, col=1+q, lty=1+floor(q/8), lwd=2)
}
dev.off()

mahendra-mariadassou commented 4 years ago

The dev version uses a different initialization which seems to lead to better results in general but not always. The optimization steps is also quite sensitive to the initialization. You can use control = list(ftol_out = 1e-6) when calling PLN* to ensure more reproductible behavior (or stick with the release version until everything has been ironed out...)

jchiquet commented 4 years ago

Good to know.

On top of Mahendra answer, note that I added all the constant terms in the variational lower bounds in the dev version, while some are missing in the CRAN version.

scj-robin commented 4 years ago

By the way, with the dev version, fitting first a PLN with diagonal Sigma, then a PLN with full Sigma, using the first results as inception yields almost identical regressions coefficients. The same with the cran version yields (not that much but still) different regression coefficients. As far as I know, the theory does not guarantee that they should be equal.

jchiquet commented 4 years ago

An additional comment : at the moment, I have essentially been working on optimisation alternative for PLN (not PLNnetwork) in the dev branch, so results on network inference should not be trusted.

jchiquet commented 4 years ago

This should be fixed in last commit of dev https://github.com/jchiquet/PLNmodels/commit/c6dda121ac3a4a20c693ff1f6ebfb3d5c8a4bb5f

PLN-team / PLNmodels

Different results using the cran and the dev version #55