DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.3k stars 559 forks source link

Diagnostic Plots for Linear Regression Analysis #1211

Open pkaf opened 2 years ago

pkaf commented 2 years ago

If not already considered or being developed it would be neat to have some standard diagnostic plots for linear regression analysis mainly

a. residual vs fitted b. normal q-q c. scale-location d. residual vs leverage

as shown in https://data.library.virginia.edu/diagnostic-plots/. Example plots

Screen Shot 2022-01-25 at 8 53 42 pm Screen Shot 2022-01-25 at 8 53 48 pm Screen Shot 2022-01-25 at 8 53 53 pm Screen Shot 2022-01-25 at 8 54 00 pm

I am happy to PR.

pkaf commented 2 years ago

I will love to hear your thought on ^^ @bbengfort .

bbengfort commented 2 years ago

@pkaf We'd certainly be open to more regression analysis tools or adaptations of our current tools to support these types of analyses.

The ResidualsPlot is currently plotted against the fitted value, so I think that's what plot 1 is - it also has the option to have a Q-Q plot alongside it, which I think is plot 2. Perhaps that plot could be modified to plot the residuals against actual value instead of the predicted value?

Scale-location vs fitted values (your third plot) also seems like it might be an adaptation of the ResidualsPlot to standardize the residuals rather than using the raw residuals - this would be a great param to add!

We also have a CooksDistance visualizer, which may be related to your last plot of standardized residuals to Leverage, or might be a building block towards that visualization.

If the ResidualsPlot is not sufficient, perhaps you could look into creating a ResidualsDiagnostics visualizer that plots all four of these graphs in 4 separate axes? We haven't done a lot of multi-axes plotting, but this could be a good start toward that.

pkaf commented 2 years ago

@bbengfort recently, I pushed an example depicting above graphs in statsmodels https://www.statsmodels.org/devel/examples/notebooks/generated/linear_regression_diagnostics_plots.html . We can adapt it here too.

bbengfort commented 2 years ago

@pkaf awesome - we welcome any PRs that you might open for Yellowbrick!