Data4DM / stanify

Bridging System dynamics tool (Vensim) and Bayesian computation tool (Stan)
https://data4dm.github.io/stanify/
2 stars 1 forks source link

Putting prior, prior predictive, posterior into one inferencedata for hierarchical sbc #12

Open hyunjimoon opened 1 year ago

hyunjimoon commented 1 year ago

Two large tasks, discussed with @Dashadower.

draws_data_mapping wise

To implement this, first apply this arviz sbc_cmdstanpy by @OriolAbril to stanify for S3R1 case. From commit https://github.com/Data4DM/stanify/commit/3b65e7741bfb2c2bbf4516e71b5aa917d6a88d58, I marked three parts for code injection. Then try S3R2 case.

With current implementation (before arviz wrapper code injection), S3R1 runs, creating one generator.nc and three estimator.nc files.

However, S3R2 doesn't run with the following error which requires in-depth exploration for size of each data.

Traceback (most recent call last):
  File "/Users/hyunjimoon/Dropbox/stanify/prey_predator.py", line 35, in <module>
    draws2data2draws('vensim_models/prey_predator/prey_predator.mdl', setting, numeric, prior, S, M, N, R)
  File "/Users/hyunjimoon/Dropbox/stanify/stanify/calibrator/draws_data_mapper.py", line 168, in draws2data2draws
    obs_dict = {k: v.values.flatten().reshape((model.n_t, R)) for (k, v) in obs_s_xr[setting_obs].items()} #xr.concat([ posterior_lst])
  File "/Users/hyunjimoon/Dropbox/stanify/stanify/calibrator/draws_data_mapper.py", line 168, in <dictcomp>
    obs_dict = {k: v.values.flatten().reshape((model.n_t, R)) for (k, v) in obs_s_xr[setting_obs].items()} #xr.concat([ posterior_lst])
ValueError: cannot reshape array of size 200 into shape (200,2)

builder (auto stan file generator) wise

update for three parts:

  1. draws2data gq block from
    real pred_birth_frac = normal_rng(0.05, 0.005);

    to

    for (region in 1:r)
        pred_birth_frac[region] = normal_rng(0.05, 0.005);
  2. data2draws parameters block from
    real<lower=0> pred_birth_frac;

    to

    array[region] real<lower=0> pred_birth_frac;
  3. data2draws model block from
    pred_birth_frac ~ normal(0.05, 0.005);

    to

    for (region in 1:r)
        pred_birth_frac[region] ~ normal(0.05, 0.005);
hyunjimoon commented 1 year ago

Niko's monster model with adaptive precision, this line uses ode_bdf where we can learn the output format for integrated result.

I also wonder the reason for using ragged array from ragged.stan file which I questioned in discourse.

hyunjimoon commented 1 year ago

Without hierarchy, integrated_result was able to be declared and initialized at the same time. With hierarchy as for loop for region is coming in which forces it to be pre-declared than value is assigned from ode function.

Before I made the change, I faced identifier integrate_result not in scope error which is not informative error message.

    array[n_t, r] vector[3] integrated_result;
    for (region in 1:r){
        integrated_result[:, region, :]  = ode_rk45(vensim_ode_func, initial_outcome, initial_time, times, prey_birth_frac, pred_birth_frac[region], time_step, process_noise_scale);
    }
    array[n_t] vector [r] prey = integrated_result[1];
    array[n_t] vector [r] process_noise = integrated_result[ 2];
    array[n_t] vector [r] predator = integrated_result[3];
hyunjimoon commented 1 year ago

Treatment of hierarchical and SBC multi S should be the same.

Questions from sbc-cmdstanpy

prior_fit = prior_model.sample(data="linreg_prior.data.json", chains=2, iter_sampling=100)

idata_kwargs = dict(
    prior_predictive=["slack_comments_hat","github_commits_hat"],
    posterior_predictive=["slack_comments_hat","github_commits_hat"],
    log_likelihood={
        "slack_comments": "log_likelihood_slack_comments",
        "github_commits": "log_likelihood_github_commits"
    },
    coords={"developer": names},
    dims={
        "slack_comments": ["developer"],
        "github_commits" : ["developer"],
        "slack_comments_hat": ["developer"],
        "github_commits_hat": ["developer"],
        "time_since_joined": ["developer"],
    }
)
idata_orig = az.from_cmdstanpy(
    prior=prior_fit, **idata_kwargs
).stack(prior_draw=["chain", "draw"], groups="prior_groups")

prior_predictive Dimensions: (developer: 5, prior_draw: 200) Coordinates:

sample_stats_prior Dimensions: (prior_draw: 200) Coordinates:

The nc-to-rvars conversion is done by the class_composition R6 class here, specifically the nc_to_rvar method here

  1. plots for ode model (e.g. degeneracy diagnosis for prey-predator and inventory model)
    • funnel only from hierarchical model? (trying to introduce hierarchical prey-predator) + funnel degeneracy is included in..? a. multimodality of hierarchical prey-predator from this issue image

      This adds 4 regions with variety in the alpha/beta/gamma/delta parameters, loosely structured after https://betanalpha.github.io/assets/case_studies/hierarchical_modeling.html that @hyunjimoon linked. Part of the intent was to create the opportunity for funnels. I haven't run it much yet, but I think it does show evidence of multimodality, or of pathologies that cause Powell to get stuck occasionally. I haven't tested MCMC either, but will get there soon.

b. how to set # of estimated parameters in ode model? inventory model from this issue

  1. how often do you use diagnose() function, and does arviz provide plot which use the output of diagnose()?
    02:49:23 - cmdstanpy - WARNING - Some chains may have failed to converge.
    Chain 1 had 48 divergent transitions (96.0%)
    Chain 2 had 47 divergent transitions (94.0%)
    Use function "diagnose()" to see further information.
hyunjimoon commented 1 year ago

One example of prior predictive (synthetic data generated from s'th prior draw) are as follows, on which posterior is fit.

Dimensions:             (predator_obs_dim_0: 200, prey_obs_dim_0: 200)
Coordinates:
  * predator_obs_dim_0  (predator_obs_dim_0) int64 0 1 2 3 4 ... 196 197 198 199
  * prey_obs_dim_0      (prey_obs_dim_0) int64 0 1 2 3 4 ... 195 196 197 198 199
    prior_draw          object (0, 0)
    chain               int64 0
    draw                int64 0
Data variables:
    predator_obs        (predator_obs_dim_0) float64 4.027 4.049 ... 8.142 7.973
    prey_obs            (prey_obs_dim_0) float64 30.16 30.63 ... 6.133 6.205
Attributes: (4)

Error I am facing is:

ValueError: different number of dimensions on data and dims: 3 vs 4