arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.56k stars 386 forks source link

boolean arguments in cmdstan csv comments no longer parse correctly #2353

Closed NicholasCowie closed 2 weeks ago

NicholasCowie commented 4 weeks ago

Describe the bug A clear and concise description of what the bug is.

When arviz parses cmdstan csv files in the module io_cmdstan it assumes that the boolean arguments are represented as numbers 0 or 1. However, since PR https://github.com/stan-dev/cmdstan/pull/1260 these are now represented as strings 'false' and 'true'. This causes arviz.from_cmdstan() to break:

To Reproduce

test.py file:

from cmdstanpy import CmdStanModel
import arviz as az

model = CmdStanModel(stan_file="test.stan")
data_input = {
    "N": 6,
    "y": [1, 0, 1, 0, 0, 1]
}
fit = model.sample(
                   data=data_input, 
                   save_warmup=True
               )
idata = az.from_cmdstan(fit.runset.csv_files)

test.stan file:

data {
  int<lower=0> N;
  array[N] int<lower=0,upper=1> y;
}
parameters {
  real<lower=0,upper=1> theta;
}
model {
  theta ~ beta(1,1);  // uniform prior on interval 0,1
  y ~ bernoulli(theta);
}

The output from running test.py is as follows:

Traceback (most recent call last):

    idata = az.from_cmdstan(fit.runset.csv_files)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/arviz/data/io_cmdstan.py", line 1013, in from_cmdstan
    return CmdStanConverter(
           ^^^^^^^^^^^^^^^^^
  File "user/.venv/lib/python3.11/site-packages/arviz/data/io_cmdstan.py", line 107, in __init__
    self._parse_posterior()
  File "user/.venv/lib/python3.11/site-packages/arviz/data/base.py", line 67, in wrapped
    return func(cls)
           ^^^^^^^^^
  File "user/.venv/lib/python3.11/site-packages/arviz/data/io_cmdstan.py", line 129, in _parse_posterior
    output_data = _read_output(path)
                  ^^^^^^^^^^^^^^^^^^
  File "user/.venv/lib/python3.11/site-packages/arviz/data/io_cmdstan.py", line 803, in _read_output
    int(pconf.get("save_warmup", 0))
ValueError: invalid literal for int() with base 10: 'true'

Expected behavior Load an inference data object

Additional context cmdstan version 2.35.0 cmdstanpy==1.2.3 arviz==0.18.0

OriolAbril commented 4 weeks ago

Thanks for reporting, I guess we should add a try except in the parser or some check for the true/false strings. Not sure when I'll be able to take a look. After quickly skimming the linked PR, it looks like this is for sampling/model metadata/arguments only but not for actual samples, can you confirm?

NicholasCowie commented 4 weeks ago

Thank you, and I can confirm this is not for any of the actual samples.

Regards, Nick