Putting prior, prior predictive, posterior into one inferencedata for hierarchical sbc

hyunjimoon commented 1 year ago

Two large tasks, discussed with @Dashadower.

draws_data_mapping wise

To implement this, first apply this arviz sbc_cmdstanpy by @OriolAbril to stanify for S3R1 case. From commit https://github.com/Data4DM/stanify/commit/3b65e7741bfb2c2bbf4516e71b5aa917d6a88d58, I marked three parts for code injection. Then try S3R2 case.

With current implementation (before arviz wrapper code injection), S3R1 runs, creating one generator.nc and three estimator.nc files.

However, S3R2 doesn't run with the following error which requires in-depth exploration for size of each data.

Traceback (most recent call last):
  File "/Users/hyunjimoon/Dropbox/stanify/prey_predator.py", line 35, in <module>
    draws2data2draws('vensim_models/prey_predator/prey_predator.mdl', setting, numeric, prior, S, M, N, R)
  File "/Users/hyunjimoon/Dropbox/stanify/stanify/calibrator/draws_data_mapper.py", line 168, in draws2data2draws
    obs_dict = {k: v.values.flatten().reshape((model.n_t, R)) for (k, v) in obs_s_xr[setting_obs].items()} #xr.concat([ posterior_lst])
  File "/Users/hyunjimoon/Dropbox/stanify/stanify/calibrator/draws_data_mapper.py", line 168, in <dictcomp>
    obs_dict = {k: v.values.flatten().reshape((model.n_t, R)) for (k, v) in obs_s_xr[setting_obs].items()} #xr.concat([ posterior_lst])
ValueError: cannot reshape array of size 200 into shape (200,2)

builder (auto stan file generator) wise

update for three parts:

draws2data gq block from

real pred_birth_frac = normal_rng(0.05, 0.005);

to

for (region in 1:r)
    pred_birth_frac[region] = normal_rng(0.05, 0.005);

data2draws parameters block from

real<lower=0> pred_birth_frac;

to

array[region] real<lower=0> pred_birth_frac;

data2draws model block from

pred_birth_frac ~ normal(0.05, 0.005);

to

for (region in 1:r)
    pred_birth_frac[region] ~ normal(0.05, 0.005);

hyunjimoon commented 1 year ago

Niko's monster model with adaptive precision, this line uses ode_bdf where we can learn the output format for integrated result.

I also wonder the reason for using ragged array from ragged.stan file which I questioned in discourse.

hyunjimoon commented 1 year ago

Without hierarchy, integrated_result was able to be declared and initialized at the same time. With hierarchy as for loop for region is coming in which forces it to be pre-declared than value is assigned from ode function.

Before I made the change, I faced identifier integrate_result not in scope error which is not informative error message.

    array[n_t, r] vector[3] integrated_result;
    for (region in 1:r){
        integrated_result[:, region, :]  = ode_rk45(vensim_ode_func, initial_outcome, initial_time, times, prey_birth_frac, pred_birth_frac[region], time_step, process_noise_scale);
    }
    array[n_t] vector [r] prey = integrated_result[1];
    array[n_t] vector [r] process_noise = integrated_result[ 2];
    array[n_t] vector [r] predator = integrated_result[3];

hyunjimoon commented 1 year ago

Treatment of hierarchical and SBC multi S should be the same.

Questions from sbc-cmdstanpy

cause of scale parameter 0; need for a zero avoiding prior?

Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/home/oriol/Public/arviz/doc/source/user_guide/wrappers/linreg_model.stan', line 27, column 2 to column 64)

benefit of write_stan_json? Faster? (I am using dict, and "Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format." from cmdstanpy).
Rdump?

prior_fit = prior_model.sample(data="linreg_prior.data.json", chains=2, iter_sampling=100)

idata_kwargs = dict(
    prior_predictive=["slack_comments_hat","github_commits_hat"],
    posterior_predictive=["slack_comments_hat","github_commits_hat"],
    log_likelihood={
        "slack_comments": "log_likelihood_slack_comments",
        "github_commits": "log_likelihood_github_commits"
    },
    coords={"developer": names},
    dims={
        "slack_comments": ["developer"],
        "github_commits" : ["developer"],
        "slack_comments_hat": ["developer"],
        "github_commits_hat": ["developer"],
        "time_since_joined": ["developer"],
    }
)
idata_orig = az.from_cmdstanpy(
    prior=prior_fit, **idata_kwargs
).stack(prior_draw=["chain", "draw"], groups="prior_groups")

why MultiIndex? (multi means what and what?)


prior
Dimensions:      (prior_draw: 200)
Coordinates:
* prior_draw   (prior_draw) object MultiIndex
* chain        (prior_draw) int64 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1
* draw         (prior_draw) int64 0 1 2 3 4 5 6 7 ... 92 93 94 95 96 97 98 99
Data variables:
b0           (prior_draw) float64 104.4 -239.1 -249.9 ... 222.8 -46.17
b1           (prior_draw) float64 116.8 -175.4 -71.92 ... -162.4 7.715 274.3
log_b_sigma  (prior_draw) float64 6.163 6.122 5.205 ... 3.633 6.172 6.076
b_sigma      (prior_draw) float64 474.6 455.6 182.3 ... 37.84 479.3 435.3
c0           (prior_draw) float64 -2.87 -6.738 11.82 ... -10.68 -11.76
c1           (prior_draw) float64 18.08 -7.793 4.934 ... -2.836 -14.52
log_c_sigma  (prior_draw) float64 6.163 6.122 5.205 ... 3.633 6.172 6.076
c_sigma      (prior_draw) float64 9.529 6.93 6.605 ... 1.642 4.487 3.449

prior_predictive Dimensions: (developer: 5, prior_draw: 200) Coordinates:

developer (developer) object 'Alice' 'Bob' ... 'Danielle' 'Erika'
prior_draw (prior_draw) object MultiIndex
chain (prior_draw) int64 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1
draw (prior_draw) int64 0 1 2 3 4 5 6 ... 94 95 96 97 98 99 Data variables: slack_comments_hat (developer, prior_draw) float64 701.6 ... 4.755e+03 github_commits_hat (developer, prior_draw) float64 82.57 -39.29 ... -270.4

sample_stats_prior Dimensions: (prior_draw: 200) Coordinates:

prior_draw (prior_draw) object MultiIndex
chain (prior_draw) int64 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1
draw (prior_draw) int64 0 1 2 3 4 5 6 7 ... 93 94 95 96 97 98 99 Data variables: lp (prior_draw) float64 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 acceptance_rate (prior_draw) float64 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0

for next (week, month) meeting, need help in
1. loglik-based plots
2. rvar support from posterior? Mike's comment

The nc-to-rvars conversion is done by the class_composition R6 class here, specifically the nc_to_rvar method here

plots for ode model (e.g. degeneracy diagnosis for prey-predator and inventory model)
- funnel only from hierarchical model? (trying to introduce hierarchical prey-predator) + funnel degeneracy is included in..? a. multimodality of hierarchical prey-predator from this issue
  
  This adds 4 regions with variety in the alpha/beta/gamma/delta parameters, loosely structured after https://betanalpha.github.io/assets/case_studies/hierarchical_modeling.html that @hyunjimoon linked. Part of the intent was to create the opportunity for funnels. I haven't run it much yet, but I think it does show evidence of multimodality, or of pathologies that cause Powell to get stuck occasionally. I haven't tested MCMC either, but will get there soon.

b. how to set # of estimated parameters in ode model? inventory model from this issue

how often do you use diagnose() function, and does arviz provide plot which use the output of diagnose()?

02:49:23 - cmdstanpy - WARNING - Some chains may have failed to converge.
Chain 1 had 48 divergent transitions (96.0%)
Chain 2 had 47 divergent transitions (94.0%)
Use function "diagnose()" to see further information.

hyunjimoon commented 1 year ago

One example of prior predictive (synthetic data generated from s'th prior draw) are as follows, on which posterior is fit.

Dimensions:             (predator_obs_dim_0: 200, prey_obs_dim_0: 200)
Coordinates:
  * predator_obs_dim_0  (predator_obs_dim_0) int64 0 1 2 3 4 ... 196 197 198 199
  * prey_obs_dim_0      (prey_obs_dim_0) int64 0 1 2 3 4 ... 195 196 197 198 199
    prior_draw          object (0, 0)
    chain               int64 0
    draw                int64 0
Data variables:
    predator_obs        (predator_obs_dim_0) float64 4.027 4.049 ... 8.142 7.973
    prey_obs            (prey_obs_dim_0) float64 30.16 30.63 ... 6.133 6.205
Attributes: (4)

Error I am facing is:

ValueError: different number of dimensions on data and dims: 3 vs 4

Data4DM / stanify

Putting prior, prior predictive, posterior into one inferencedata for hierarchical sbc #12

draws_data_mapping wise

builder (auto stan file generator) wise