Closed GStechschulte closed 11 months ago
Attention: 1 lines
in your changes are missing coverage. Please review.
Comparison is base (
dcd879b
) 89.90% compared to head (6208fe8
) 89.91%.
Files | Patch % | Lines |
---|---|---|
bambi/interpret/utils.py | 90.00% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks for the nice feature! Just a couple of suggestions.
Thanks for the review. I will incorporate these once we finalize the implementation per our conversation on Slack.
Closing in favor of #762
This PR addresses issue https://github.com/bambinos/bambi/issues/703 and #751 by adding a parameter
return_idata: bool = False
in comparisons(), predictions(), and slopes() that merges the posterior draws with the corresponding observation that "produced" that draw and returns it as a dataframe.Most of the code diff is from adding a new test file that tests non-plotting functionality of the interpret sub-package not tested in
test_plots.py
.With
return_idata=True
, one data frame is returned. This dataframe contains the inference data from the posterior groupInferenceData
object, observed data, and parameter estimates. In the case that a user is callingpredictions
withpps=True
, then the posterior predictive group is used. {marginaleffects} has a similar functionality for Bayesian models.Below are a few examples:
1200000 rows × 15 columns
Returning the inference data when calling comparisons will allow the user to conduct more specific or complex comparisons leveraging group by aggregations:
48000 rows × 15 columns
Initially, I wanted to return the
az.InferenceData
object. However, due to the following limitations I settled on a DataFrame:I could add new groups to the inference data object, but then it isn't clear to me how to perform group by aggregations on the posterior dataset while taking into account this new group.
Additionally, the data shouldn't be merged as a data variable in the az.InferenceData.posterior dataset because, when aggregations are performed along the coordinates, these aggregations will also be applied to the data used to generate the predictions (since they were merged as a data variable).
Lastly, the data could be merged and made as a coordinate so you can specify along which dimension(s) you want to compute the aggregation. Although, I can't seem to groupby more than one coordinate. For example,
xr.Dataset.groupby([coord1, coord2])
results in the following error:TypeError: group must be an xarray.DataArray or the name of an xarray variable or dimension. Received ['coord1', 'coord2'] instead.
Note: depending on the model specification and number of chains and draws, it is possible there will be millions of rows returned.
To do:
pps=True
(posterior predictive samples) inpredictions
. Currently, I only access the posterior group of the InferenceData to build the dataframe.