Pathfinder for SEM diverges on all iterations

emilijapur commented 2 years ago

Hello again. I was trying out pathfinder algorithm for SEM model and my code works for different methods (HMC, Variational inference), however it always fails when using Pathfinder algorithm.

My code with generated data:

# library(cmdstanr)
library(rstan)
#tryCatch({cmdstanr::set_cmdstan_path('/home/cmdstanr/cmdstan-2.28.2')})
library(cmdstanr, lib.loc ='/home/pathfinder/pathfinder_testing/cmdstanr')
set_cmdstan_path('/home/pathfinder/pathfinder_testing/cmdstan')

y <- rnorm(n = 100, mean = 0, sd = 2)
x <- rnorm(n = 100, mean = y, sd = 2)
z1 <- rnorm(n = 100, mean = x, sd = 0.5)
z2 <- rnorm(n = 100, mean = x, sd = 0.5)
z3 <- rnorm(n = 100, mean = x, sd = 0.5)
d <- data.frame(y = y, x = x, z1 = z1, z2 = z2, z3 = z3)
rm(x, y, z1, z2, z3)

# add empty latent variable to data frame
d$X <- as.numeric(NA)

bf1 <- bf(z1 ~ 0 + mi(X)) # factor loading 1 (constant)
bf2 <- bf(z2 ~ 0 + mi(X)) # factor loading 2
bf3 <- bf(z3 ~ 0 + mi(X)) # factor loading 3
bf4 <- bf(X | mi() ~ 0)   # latent variable
bf5 <- bf(y ~ 0 + mi(X))  # added regression

fit3 <- brm(bf1 + bf2 + bf3 + bf4 + bf5 + set_rescor(FALSE), data = d, 
            prior = c(prior(constant(1), coef = miX, resp = z1),
                      prior(normal(1, 1), coef = miX, resp = z2),
                      prior(normal(1, 1), coef = miX, resp = z3),
                      prior(normal(0, 1), coef = miX, resp = y)),
            backend = 'cmdstanr',
            iter = 2000, 
            warmup = 1000, 
            chains = 3, 
            cores = 3,
            threads = threading(threads = 4),
            # jei diverguoja gali padėti regis adapt_delta pakėlimas
            #control = list(adapt_delta = 0.999, max_treedepth = 15)
            control = list(adapt_delta = 0.99, max_treedepth = 15))
summary(fit3)

# cmdstanr way

bf0 <- bf1 + bf2 + bf3 + bf4 + bf5 + set_rescor(FALSE)
data0 <- d
prior0 <- c(prior(constant(1), coef = miX, resp = z1), 
            prior(normal(1, 1), coef = miX, resp = z2),
            prior(normal(1, 1), coef = miX, resp = z3),
            prior(normal(0, 1), coef = miX, resp = y))

my_stan_code <- make_stancode(
  formula = bf0,
  data = data0,
  prior = prior0,
  autocor = NULL,
  data2 = NULL,
  cov_ranef = NULL,
  sparse = NULL,
  sample_prior = "no",
  stanvars = NULL,
  stan_funs = NULL,
  knots = NULL,
  #threads = threading(threads = nthreads),
  normalize = getOption("brms.normalize", TRUE),
  save_model = '/home/rstudio/user_projects/Mantas/Pathfinder/psc_20220208.stan'
)

# make a native stan data structure from brm objects
my_stan_data <- make_standata(
  formula = bf0,
  data = data0,
  # !!!!! šeimos čia nereikia nurodyti, nes modelis yra multivariate tipo
  #family = gaussian(),
  prior = prior0,
  autocor = NULL,
  data2 = NULL,
  cov_ranef = NULL,
  sample_prior = "no",
  stanvars = NULL,
  #threads = threading(threads = nthreads),
  knots = NULL
)

# create cmdstan model object
ex_mod = cmdstan_model(stan_file = '/home/rstudio/user_projects/Mantas/Pathfinder/psc_20220208.stan', 
                       compile = FALSE)

# compile
ex_mod$compile(
  quiet = FALSE,
  dir = NULL,
  pedantic = TRUE,
  include_paths = NULL,
  cpp_options = list(stan_threads = TRUE),
  stanc_options = list(),
  force_recompile = TRUE
)

# NUTS (HMC)
ex_fit_sample = ex_mod$sample(data = my_stan_data,
                              iter_sampling = 1000, 
                              chains = 3, 
                              parallel_chains = 3,
                              threads_per_chain = 4,
                              # pridėjau parametrus
                              adapt_delta = 0.99, 
                              max_treedepth = 15,
                              output_dir = '/home/rstudio/user_projects/Mantas/Pathfinder/')

# meanfield variational inference
nthreads = 12
ex_fit_vb <- ex_mod$variational(
  data = my_stan_data,
  seed = NULL,
  refresh = 5,
  init = NULL,
  save_latent_dynamics = FALSE,
  output_dir = '/home/rstudio/user_projects/Mantas/Pathfinder/',
  output_basename = 'psc_variational',
  sig_figs = NULL,
  threads = nthreads,
  opencl_ids = NULL,
  algorithm = NULL,
  iter = 10000,
  grad_samples = NULL,
  elbo_samples = NULL,
  eta = NULL,
  adapt_engaged = NULL,
  adapt_iter = NULL,
  tol_rel_obj = 0.0000001,
  eval_elbo = NULL,
  output_samples = 10000
)

ex_fit_sample$summary()
ex_fit_vb$summary()

#pathfinder

ex_fit = ex_mod$pathfinder(algorithm = "single", data =my_stan_data,
                           refresh = 6, threads = num_cores, iter = 10000,  num_elbo_draws = 1000,
                           history_size = 6, init_alpha = 0.0000000001,
                           num_draws = 10001, init = 3,
                           num_eval_attempts = 5,tol_rel_obj = 0.0000001
                           )#, tol_obj = 0, tol_grad = 0, tol_param = 0, tol_rel_grad = 0, tol_rel_obj = 0)

ex_fit$summary()

And while fitting with pathfinder I get errors:

Pathfinder Failure: None of the LBFGS iterations completed successfully

Maybe I shall change some of my parameters ant that is why all iterations diverge? Or is it some deeper problem, i.e. pathfinder can not be used for Structual Equation Modelling (SEM)? Where could I read documentation or maybe see Raw code in order to know what each parameter does?

Thanks for help!

emilijapur commented 2 years ago

Hey, @SteveBronder, how are you doing? Did You have an opportunity to look into this issue?

SteveBronder commented 2 years ago

Hi sorry I meant to get to this during the week but had to work on the laplace approximations. I should have some time next week to take a look at this

SteveBronder commented 2 years ago

Hello just to update looking at this now. It seems odd, but also optimize also gives back wonky results so it may be just because the pathfinder schema is based on lbfgs. Another issue here is that the program spends a looong time just generating error log messages which is the main cause of the slowdown.

@bob-carpenter right now in pathfinder we log an error for each approximate sample if we go over the number of evaluation attempts. Would it be okay if we only log an error only if we are unable to sample any approximate samples?

bob-carpenter commented 2 years ago

Yes, I think it's fine to cut down on the logging. The point is to get a message to the user so they can diagnose their code.

We can't really do anything if optimize is giving wonky results other than use better optimizers.

emilijapur commented 2 years ago

Sorry, I didn't get You. Is it a bug in my code? Because this code works with different inference methods (i.e. HMC). Are You going to look deeper into this problem?

SteveBronder commented 2 years ago

Is it a bug in my code?

Idk, but your code works for HMC so this could very well just be a problem that LBFGS cannot do well.

Are You going to look deeper into this problem?

Yes I'm sending this over to Lu to look at as well. It's good to have models that break so this is very appreciated!

SteveBronder / pathfinder_testing

Pathfinder for SEM diverges on all iterations #3