[715] Adding Mr P docs - Githubissues

NathanielF commented 1 year ago

Mister P

This is a PR trying to explain and illustrate the technique and value of Multilevel regression and post-stratification techniques using bambi. It is related to issue 715.

The structure of the notebook is seperated in two parts: (a) a refresher on how to think about the properties of the regression modelling as a method for generating an appropriately weighted average across the conditional strata in your in X matrix. We demonstrate this in some detail because MrP can be viewed as a correction to the effects of regression weighting when regression is applied to non-representative data.

We then replicate the kind of analysis achieved with R here

review-notebook-app[bot] commented 1 year ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

NathanielF commented 1 year ago

Quarto site seems to render ok too:

NathanielF commented 1 year ago

Issue with new contrast function:

NathanielF commented 1 year ago

issue with new predictions function:

tomicapretto commented 1 year ago

@NathanielF thanks a lot for proactively writing this notebook! I'm adding some minor comments.

tomicapretto commented 1 year ago

issue with new predictions function:

I'll try to review this ASAP

NathanielF commented 1 year ago

I've added some usages of the new plot comparisons and comparisons functionality to this notebook. I specified a base model without the (1 | edu) syntax and demonstrated using this new functionality how the variables of race and education interact within states and used this to justify the complexity of our final model. I think i'm happy enough with the notebook as it stands so i'm marking this as ready for review.

GStechschulte commented 1 year ago

issue with new predictions function:

I'll try to review this ASAP

I am back from holidays and should be able to review this issue this week.

NathanielF commented 1 year ago

Sorry @GStechschulte I must have deleted my earlier comment. Let me know if you have any questions about the errors i found. They only seemed to occur in my "complex" model where i had allot of group specfic intercept terms (1 | group).

tomicapretto commented 1 year ago

issue with new predictions function:

I'll try to review this ASAP

I am back from holidays and should be able to review this issue this week.

@GStechschulte I did a dirty and quick check (translation: added a print before .predict() is called inside predictions()) and saw the dataframe that is passed to predict does not contain the variable state.

GStechschulte commented 1 year ago

The problem is that no default values are being computed for model terms specified as group level effects. This is because group level effects do not have a components attribute, and thus never satisfy the logic in the set_default_values() function of ../interpret/utils.py.

For example, the output of the for loop in utils.py with print statements added:

# ../interpret/utils.py
    for term in terms.values():
        print(f"term variable names: {term.var_names}")
        if hasattr(term, "components"):
            print(f"component variable names: {term.var_names}")
            for component in term.components:
                # If the component is a function call, use the argument names
                if isinstance(component, Call):
                    names = [arg.name for arg in component.call.args]
                else:
                    names = [component.name]
                for name in names:
                    if name not in data_dict:
                        # For numeric predictors, select the mean.
                        if component.kind == "numeric":
                            data_dict[name] = np.mean(model.data[name])
                        # For categoric predictors, select the most frequent level.
                        elif component.kind == "categoric":
                            data_dict[name] = mode(model.data[name])

term variable names: set()
term variable names: {'male'}
component variable names: {'male'}
term variable names: {'repvote'}
component variable names: {'repvote'}
term variable names: {'state'}
term variable names: {'eth'}
term variable names: {'edu'}
term variable names: {'male', 'eth'}
term variable names: {'age', 'edu'}
term variable names: {'edu', 'eth'}

The created cap_data indeed includes the user provided predictors and repvote and male. However, no default values are computed for terms that do not have the attribute component.

Group specific effects do have attributes such as expr, factor, and groups that could be used to identify such terms. However, this is still problematic since it is not a component, the kind attribute (used to determine the data type) is intercept or slope.

Furthermore, it is possible to specify interaction terms within the group level effect. This would require us to find the unique set of term names defined by the modeller to ensure default values are not computed more than once.

I am currently attempting to find a solution.

GStechschulte commented 1 year ago

predictions, comparisons, and slopes now works for models with common and group specific effects as the bug for computing default values for group specific effects is resolved.

Now, to compute default values, all unique common and group specific effects variable names are identified and put into a list. From there, the data type is determined by identifying the data types used to fit the model.

However, with group specific effects, when the user (or default values are computed) passes values, it is possible these values are new unseen data. Thus, I leverage PR #693 and add the arg sample_new_groups to the functions predictions, comparisons, and slopes (and the associated plotting functions) to allow plotting of new groups.

The code and plots below are for the hierarchical model with group specific effects.

bmb.interpret.plot_predictions(
    model=model_hierarchical,
    idata=result_hierarchical,
    covariates=["edu", "eth"],
    sample_new_groups=True,
    pps=False,
    legend=False,
    fig_kwargs={"figsize": (12, 5), "sharey": True}
)

fig, ax = bmb.interpret.plot_comparisons(
    model=model_hierarchical,
    idata=result_hierarchical,
    contrast={"eth": ["Black", "White"]},
    conditional=["age", "edu"],
    comparison_type="diff",
    sample_new_groups=True,
    subplot_kwargs={"main": "age", "group": "edu"},
    fig_kwargs={"figsize": (7, 3), "sharey": True},
    legend=True,
)

@NathanielF I will create a separate PR today, and then you can pull these changes into your branch 👍🏼

NathanielF commented 1 year ago

Great!

GStechschulte commented 1 year ago

Great!

Changes are now up in PR #721

NathanielF commented 1 year ago

I've tightened some of the writing and updated some of the plots. The broad thrust of the article is where I want it to be.

Moving from a discussion of regression as a means for automating stratum specific effect modification into a discussion of MrP as a technique for recovering the correct modification effects.

Throughout, I make use of the new model interepretability functions to emphasise the variation of response across different strata. I think the themes of the piece tie together well with a demonstration of these new functions.

On reflection I think the way I've plotted the output of the final model doesn't need the new plotting functions because in a sense the plots I need require I adjust the model, not plot the predictions as they are.

I'm happy enough with it now as it stands, but would like a review of the content - maybe in particular with an emphasis on what aspects of the new functionality I could add to tie things together better... @tomicapretto , @GStechschulte

tomicapretto commented 1 year ago

The error in the test is because the 1.26 release of NumPy. See https://github.com/pymc-devs/pytensor/issues/441

tomicapretto commented 1 year ago

@NathanielF thanks a lot for this great example. Couldn't finish the review now but I'll try to do it later today!

tomicapretto commented 1 year ago

@NathanielF the notebook is in great shape. It's a lot of great work. I'm still requesting some changes. Many of them are minor. In most cases I'm asking to expand a bit on the explanations/interpretations. Once that is done this will be ready to be merged. Thanks a lot!

NathanielF commented 1 year ago

Great to hear. Will adjust today

tomicapretto commented 1 year ago

@NathanielF thanks for all the updates! If you confirm you're not modifying it anymore, I can merge.

NathanielF commented 1 year ago

Thank you! Confirmed

bambinos / bambi

[715] Adding Mr P docs #716

Mister P