AEBilgrau / GMCM

Unsupervised Clustering and Meta-analysis using Gaussian Mixture Copula Models
http://AEBilgrau.github.io/GMCM
15 stars 3 forks source link

Nelder-Mead, L-BFGS, and L-BFGS-B crashing on certain datasets #63

Open eliotshekhtman opened 3 years ago

eliotshekhtman commented 3 years ago

Reproducible example:

library(MASS)
library(GMCM)

generate_sigma <- function(dims) {
  # Generates covariance matrix 
  W = matrix(rnorm(dims * dims), nrow=dims)
  covariance = W %*% t(W)
  D = diag(covariance)
  D_neg_half = diag(1.0 / sqrt(D))
  return((D_neg_half %*% covariance) %*% D_neg_half)
}

gmcm_classify <- function(data, num_means = 2) {
  # Returns GMCM predicted labels on data
  uhat = Uhat(data)
  theta = fit.full.GMCM(u = uhat, m = 2, method = "NM")
  return(classify(uhat, theta))
}

# Settings
n = 1000
dims = 2
num_means = 2
scaling = 10

set.seed(2)
d = NULL
lbls = NULL

for (i in 1:num_means)
{ # Generate data from each Gaussian and concatenate
  sigma = generate_sigma(dims)
  mean = runif(dims) * scaling
  mixture = mvrnorm(n = n / num_means, mean, sigma)
  d = rbind(d, mixture)
  lbls = c(lbls, rep(i, n / num_means))
}

# Fit and plot
plot(d, col=lbls, main="True Labels")
gmcm = gmcm_classify(d, num_means = num_means)
plot(d, col = gmcm, main="Predicted")

Error thrown:

Error in seq.default(m.ij - spread * sd.ij, m.ij + spread * sd.ij, l = n.samples[j]) : 
  'from' must be a finite number

Through inspection, the error is thrown in dgmcm.loglik while calculating marginal distributions due to NaN values in the parameters, which in turn comes from the calculation of sqrt(1 - colSums(tmp.U^2)) in vector2theta. This dataset also seems to crash on method=L-BFGS or method=L-BFGS-B but not SANN or PEM.

AEBilgrau commented 3 years ago

Thanks for the very nice bug report --- I'll hopefully have a look in the coming days and get back to you.