Closed tomicapretto closed 4 months ago
This is temporarily suspended because the var_names
parameter in pm.sample()
is not working as expected https://github.com/pymc-devs/pymc/issues/7258, and we need it for the modifications here.
Now there's a new blocker which is https://github.com/pymc-devs/pymc/issues/7312
And I would also like to have https://github.com/pymc-devs/pymc/pull/7290 merged. Currently, the progressbar has many updates which slows down the sampler in jupyter notebooks.
Everything should be fixed now. The pyproject.toml
file is pointing to the dev version of PyMC. This should be good to go once PyMC releases a new version.
Now I'm going to re-run examples.
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
I like the changes, but I find include_params
and kind="params"
ambiguous. Could we use something like linear_predictor, linear_term or something like that?
include_params
Thanks for the review! I'm not sure I understand what is the suggestion (i.e. if you want argument(s) called linear_predictor
, linear_term
or those should be argument values). I also see the current names are not the best as it's as "params" can be many things in a model.
I am saying that the name "params" is very vague.
@GStechschulte thanks for the review! Do you have any ideas for the issue @aloctavodia mentioned? I agree the current approach is a bit vague. But I don't have lots of ideas right now :D
@tomicapretto @aloctavodia apologies for the delayed response. This notification slipped through my GitHub 😞
Before, include_mean
would compute the "posterior of the mean response" which technically may not always be a mean as you stated above. This computation is $g^{-1}(\eta)$ which relates the linear predictor to the response. Since it is a response parameter, but not always the mean, maybe include_response_params
. It's a longer name, but less vague?
To be consistent in model.predict
:
kind="response_params"
as calling model.fit(include_response_params=True)
will result in the same InferenceData as the sequence of calls: (1) model.fit(include_response_params=False)
, then (2) model.predict(kind="response_params", data=None)
kind="response_obs"
or response
so users "know" posterior predictive samples are returned and the samples are on the response scale.
This started as a refactor with the goal to change how we name the parameters of the response distribution. So far, we named the parent parameter with
f"{response_name}_mean"
and all the other parameters withf"{response_name}_{param_name}"
. I no longer think appending_mean
is a sensible approach as the parent parameter can be something different than the mean. And also, I realized prepending the name of the response resulted in very long and confusing names. For that reason, I decided it's better to just use the name of the parameter (i.e.mu
for the mean in many families,p
for probability of success,kappa
,sigma
, and so on).But that is not the only change. Many other changes come with this PR. I summarize them below, and they will be added to the changelog. After this is merged, we can have a 0.14.0 release.
Summary of changes
"__obs__"
instead off"{response_name}_obs"
as the dimension name for the observation index. This makes it easier to avoid errors in our code base and results in a clear an unique name for all cases. Previously we had to worry about the possible different names of the response and their aliases.param_name
instead off"{response_name}_{param_name"
(mentioned above).f"{response_name}_{mean}"
(mentioned above).kind
inModel.predict()
now use"params"
and"response"
instead of"mean"
and"pps"
.include_mean
has been replaced byinclude_params
inModel.fit()
. The old version still works, with a future warning.pm.Deterministic
. The benefit is that the model graphs are clearer and we don't incur in any penalty related to the computation and storage of deterministics as they are not computed and stored when sampling. This is thanks to some recent developments in PyMC.bayeux-ml
as the single direct JAX-related dependency. Bayeux itself requires all the other libraries providing sampling backends for us so we don't list them directly. This may change in the future if we need to pin specific versions but I hope that's not the case.create_posterior_bayeux
that creates thexarray.Dataset
that holds the samples from the posterior when doing inference via bayeux. This makes sure we use the right dim/coord names and coord levels.model.components["mu"]
. Before we neededmodel.response_component
. Now, that component still exists but it doesn't hold information about mu.Other important notes
EDIT Bayeux based inferences don't include theDoneobserved_data
group. This is problematic if we want to do ppchecks. We need to add them.EDIT: It also closes #814