arviz-devs / preliz

Exploring and eliciting probability distributions
https://preliz.readthedocs.io
Apache License 2.0
87 stars 9 forks source link

Handle ppe for `nc_parents`(Partial Dependency) and add tests #437

Open rohanbabbar04 opened 4 months ago

rohanbabbar04 commented 4 months ago

Description

# One such example
cs_data = pd.read_csv('testdata/chemical_shifts_theo_exp.csv')
diff = cs_data.theo - cs_data.exp
cat_encode = pd.Categorical(cs_data['aa'])
idx = cat_encode.codes
coords = {"aa": cat_encode.categories}
with pm.Model(coords=coords) as model:
    # hyper_priors
    a = pm.Normal("a", mu=1, sigma=10)
    z = pm.HalfNormal("z", sigma=10)

    x = pm.Normal("x", mu=a, sigma=10, dims="aa")

    y = pm.Normal("y", mu=x[idx], sigma=z, observed=diff)

target = pz.Normal(mu=40, sigma=7)

prior, new_prior, pymc_string = pz.ppe(model, target)
print(pymc_string)
aloctavodia commented 4 months ago

Just a clarification the problem is when we have partial dependence. Currently a case like

pm.Normal('μ', mu=μ_mu, sigma=μ_sigma, dims="aa")

is properly handle

rohanbabbar04 commented 4 months ago

@aloctavodia For the above example which has partial dependency,

Currently the pymc_string which is generated(definitely does not include x) is

with pm.Model() as model:
   a = pm.Normal("a", mu=40.01,sigma=0.17)
   z = pm.HalfNormal("z", sigma=6.99)

Can you tell me what would be the pymc_string(approximate) which will be generated(a, z, x) which satisfies the target distribution?

rohanbabbar04 commented 4 months ago

Description

  • Calculate priors for partial dependent arguments, just like in the example
  • Add tests for hierarchical models partial dependency.
# One such example
cs_data = pd.read_csv('testdata/chemical_shifts_theo_exp.csv')
diff = cs_data.theo - cs_data.exp
cat_encode = pd.Categorical(cs_data['aa'])
idx = cat_encode.codes
coords = {"aa": cat_encode.categories}
with pm.Model(coords=coords) as model:
    # hyper_priors
    a = pm.Normal("a", mu=1, sigma=10)
    z = pm.HalfNormal("z", sigma=10)

    x = pm.Normal("x", mu=a, sigma=10, dims="aa")

    y = pm.Normal("y", mu=x[idx], sigma=z, observed=diff)

target = pz.Normal(mu=40, sigma=7)

prior, new_prior, pymc_string = pz.ppe(model, target)
print(pymc_string)

What would be the appropriate pymc string which will be generated for partial dependency which satisfies the target distribution?

aloctavodia commented 4 months ago

I will need to check this. But I think a proper way to find it out is to fit the model using PyMC (i.e. use pm.sample) and using a sample from target as the observations

aloctavodia commented 4 months ago

Still checking, but I think we should expect a vector with values between ~0.5 and ~1. I would recommend that you try to get the function to work, in the sense it returns something and report back to me the results, so we can both discuss if they make sense.