bambinos / bambi

BAyesian Model-Building Interface (Bambi) in Python.
https://bambinos.github.io/bambi/
MIT License
1.08k stars 124 forks source link

plot_slopes and slopes #699

Closed GStechschulte closed 1 year ago

GStechschulte commented 1 year ago

This draft PR introduces slopes and plot_slopes. Slopes can be defined as the "partial derivative of the regression equation with respect to (wrt) a regressor of interest" (definition taken from marginaleffects). slopes and plot_slopes allow the user to view a summary dataframe and or plot the slope of the regressor of interest (wrt). This PR supports the following quantities of interest:

Additionally, this PR also supports the interpretation of slopes as an elasticity (percent change in $x$ is associated with a percentage change in $y$). If the regressor of interest is categorical, computing an elasticity is not possible. For example, there is no % change in penguin species. Furthermore, if wrt is categorical when calling slopes, then comparisons is called to compute the difference in contrast means.

slopes and comparisons are closely related. Thus, I have refactored the code such that these two functions can use the same code when building the summary dataframe and plotting. Lastly, I simplified the code to build the summary data frames.

GStechschulte commented 1 year ago

A demo is shown below:

import bambi as bmb
from bambi.plots import slopes, plot_slopes

data = bmb.load_data("mtcars")
model = bmb.Model("mpg ~ hp * wt + drat", data=data, family="gaussian")
idata = model.fit(draws=1000, chains=2)

Unit level slopes

slopes(
    model,
    idata,
    wrt="hp",
    conditional=None
).head(10)
term estimate_type value drat wt estimate lower_3.0% upper_97.0%
hp dydx (110.0, 110.0001) 3.90 2.620 -0.046077 -0.061826 -0.027424
hp dydx (110.0, 110.0001) 3.90 2.875 -0.039438 -0.055346 -0.024399
hp dydx (93.0, 93.0001) 3.85 2.320 -0.053887 -0.074023 -0.033815
hp dydx (110.0, 110.0001) 3.08 3.215 -0.030586 -0.045647 -0.016320
hp dydx (175.0, 175.0001) 3.15 3.440 -0.024728 -0.040884 -0.009271
hp dydx (105.0, 105.0001) 2.76 3.460 -0.024207 -0.040335 -0.008568
hp dydx (245.0, 245.0001) 3.21 3.570 -0.021343 -0.039777 -0.006217
hp dydx (62.0, 62.0001) 3.69 3.190 -0.031237 -0.046218 -0.016832
hp dydx (95.0, 95.0001) 3.92 3.150 -0.032278 -0.046864 -0.017400
hp dydx (123.0, 123.0001) 3.92 3.440 -0.024728 -0.040884 -0.009271

Average slopes

slopes(
    model,
    idata,
    wrt="hp",
    conditional=None,
    average_by=True
)
term estimate_type estimate lower_3.0% upper_97.0%
hp dydx -0.030527 -0.051032 -0.01007

Average by group slopes (and plot)

fig, ax = plot_slopes(
    model,
    idata,
    wrt="hp",
    conditional=None,
    average_by="wt"
)
fig.set_size_inches(7, 3)

image

Condition on weight and plot elasticity

fig, ax = plot_slopes(
    model,
    idata,
    wrt="hp",
    conditional="wt",
    slope="eyex"
)
fig.set_size_inches(7, 3)

image

User provided values

fig, ax = plot_slopes(
    model,
    idata,
    wrt="hp",
    conditional={"wt": [2, 3, 5], "drat": [3.5, 4, 4.5]},
)
fig.set_size_inches(7, 3)

image

GStechschulte commented 1 year ago

Providing a quick review. I couldn't find big issues with the code, it's already almost done. Still, I'm requesting some changes (stylistic and minor updates). As always, fantastic work. Thanks a lot!

Edit It would be good to explain what is the value column in the returned data frame

Thanks a lot for the review. :) Much appreciated. In the latest commits I realised I forgot to explain what the value column is. I will add this as an inline comment and also explain it in the docs.

GStechschulte commented 1 year ago

To Do:

GStechschulte commented 1 year ago

The latest commits added the following functionality:

and added tests for plot_slopes. Below, I provide an example of added functionality (using the model and data from above):

User provided multiple values with wrt. The value column represents the values of the wrt arg. used to compute the derivative. If the user passes multiple values, a small amount $\epsilon$ eps is added to each value and then divided by that eps to obtain the "instantaneous rate of change":

slopes(
    model,
    idata,
    wrt={"hp": [150, 200, 250]},
    conditional=["wt"]
)
term estimate_type value wt drat estimate lower_3.0% upper_97.0%
hp dydx (150.0, 150.0001) 1.513000 3.596563 -0.074025 -0.104541 -0.041595
hp dydx (200.0, 200.0001) 1.513000 3.596563 -0.072006 -0.102318 -0.041779
hp dydx (250.0, 250.0001) 1.513000 3.596563 -0.069988 -0.099388 -0.040974

eydx: unit increase in $x$ (wrt) is associated with a % change in $y$:

plot_slopes(
    model,
    idata,
    wrt={"hp": 150},
    conditional=["wt"],
    slope="eydx"
)

image

dyex: % change in $x$ (wrt) is associated with a unit increase in $y$:

fig, ax = plot_slopes(
    model,
    idata,
    wrt={"hp": 150},
    conditional=["wt"],
    slope="dyex"
)

image

codecov-commenter commented 1 year ago

Codecov Report

Merging #699 (541621b) into main (cbbf955) will increase coverage by 0.70%. The diff coverage is 90.00%.

@@            Coverage Diff             @@
##             main     #699      +/-   ##
==========================================
+ Coverage   88.87%   89.58%   +0.70%     
==========================================
  Files          43       43              
  Lines        3362     3523     +161     
==========================================
+ Hits         2988     3156     +168     
+ Misses        374      367       -7     
Files Changed Coverage Δ
bambi/plots/plotting.py 84.96% <87.01%> (-3.82%) :arrow_down:
bambi/plots/effects.py 89.24% <87.89%> (+11.08%) :arrow_up:
bambi/plots/utils.py 86.76% <93.22%> (+5.91%) :arrow_up:
bambi/plots/__init__.py 100.00% <100.00%> (ø)
bambi/plots/create_data.py 100.00% <100.00%> (+24.39%) :arrow_up:

... and 1 file with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

GStechschulte commented 1 year ago

Hi @GStechschulte! I'm requesting some changes. In terms of code, very minor things. But in general, I'm asking to add many docstrings.

Thanks for the review! In regard to docstrings, I feel I must explain myself. The reason for me not adding docstrings to non-public modules was because I was following the PEP 8 section on Documentation Strings:

Write docstrings for all public modules, functions, classes, and methods. Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does. This comment should appear after the "def" line.

Nonetheless, I recognise that there are docstrings for non-public modules in the Bambi codebase, and that they help in explainability. I will add them, it is no problem 😄

tomicapretto commented 1 year ago

Hi @GStechschulte! I'm requesting some changes. In terms of code, very minor things. But in general, I'm asking to add many docstrings.

Thanks for the review! In regard to docstrings, I feel I must explain myself. The reason for me not adding docstrings to non-public modules was because I was following the PEP 8 section on Documentation Strings:

Write docstrings for all public modules, functions, classes, and methods. Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does. This comment should appear after the "def" line.

Nonetheless, I recognise that there are docstrings for non-public modules in the Bambi codebase, and that they help in explainability. I will add them, it is no problem 😄

I agree it's not necessary to add docstrings to everything, especially when it starts to become redundant. However, some internal documentation is also nice to help others understand how things work. Thanks for the flexibility in adding the requested docstrings :)