DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.26k stars 555 forks source link

Effect Plot for Linear Models #604

Open mattharrison opened 6 years ago

mattharrison commented 6 years ago

Describe the solution you'd like Would love to have an Effect Plot for aiding with interpreting linear models. I realize that a feature importance plot does some of this. An effect plot shows the weights as a bar plot so you can see whether the impact is positive or negative and also how large the variance is.

Examples

There is a great example here https://christophm.github.io/interpretable-ml-book/limo.html#visual-parameter-interpretation

My scouring has not turned up any Python code to generate this plot in the wild.

smile2snail commented 5 years ago

Hi Mattharrison,

Have you tried Random Forrest? It can help you create a feature importance chart by using python.

I found a good example to create the feature importance chart:

http://www.agcross.com/2015/02/random-forests-scikit-learn/

Is this kind of solution you are asking about? If not, please clarify and I can continue help on this issue.

bbengfort commented 5 years ago

@mattharrison great suggestion - I think an effect plot would be a very interesting feature to add to yellowbrick.regressors for any estimator that has an learned coefs_ attribute. I was a bit confused about how to determine the variance in the weight plot - but it looks like this is not required since the effect can simply be computed via the training data.

Matplotlib has a box plot implementation, so it would be straightforward to pass a 2D array of effects to produce this plot. However, I'm especially intrigued about the possibility of also including a single point as in 5.1.5 (or points). Perhaps we could provide this functionality by having the user pass in the point data to be plotted as test data?

@smile2snail thank you for chiming in here- I think what @mattharrison is looking for is a new visualizer that can create this visualization for regression models. Please also note that Yellowbrick does already have a FeatureImportances visualizer that does something very similar to the plot you suggested!

@mattharrison as always, thank you for being an excellent resource for new visualizers!

souravsingh commented 5 years ago

@bbengfort I am interested in working on the issue.

bbengfort commented 5 years ago

@souravsingh that'd be great - feel free to open a PR when you're ready to discuss it!

naresh-bachwani commented 5 years ago

Hello @bbengfort , I was working on this issue for quite a few days and have built a class(beta version) for dealing with this issue. The output with is of the form: Screenshot (126)

The code snippet looks like this

model = LinearRegression()
viz = effect(model=model)
viz.fit(dataset,Y)
viz.finalize()

I wish to hear your reviews on this and any suggestion would be valuable.

lwgray commented 5 years ago

@naresh-bachwani Thanks for commenting on this issue. We are just coming off a hiatus and it might take a bit to get to this but we will asap. I encourage you to open a PR. Our contributing guide can be found @ http://www.scikit-yb.org/en/latest/contributing.html

naresh-bachwani commented 5 years ago

Dear @bbengfort @mattharrison @rebeccabilbro @lwgray, I have been working with effect plots and PCA for some time. I have my GSoC'19 proposal ready and would like to have reviews and help from mentors. I have made a PR related to the proposal and the link for the PR is this.

naresh-bachwani commented 5 years ago

Hello @lwgray, I have done some work regarding effect plot and wanted to open a PR. But I had a question! In which directory should I put my effect plot file into? According to me it should go in yellowbricks/features. Correct me if I am wrong!

bbengfort commented 5 years ago

Hi @naresh-bachwani actually I propose that this plot should go into yellowbrick/regressor/effect.py and it should extend RegressionScoreVisualizer. I completely understand your point about the similarity of this to the FeatureImportances visualizer, however, my feeling is that this plot is more directly about the analysis and interpretation of a linear model and is coupled more deeply to this type of model than the importances plots are (which might be about classification, clustering, etc).

Why don't you go ahead and start with it there, and in the course of reviewing the PR we can see if it continues to make sense in the regressor module?

bbengfort commented 5 years ago

@naresh-bachwani I'm slowly getting back involved with PRs and issues - I noticed that you currently have two PRs open, #806 and #807; I really appreciate your enthusiasm and desire to contribute to YB - but perhaps we could focus on getting those shipped before opening a new PR for effect plots?

We're quite a small group and we do this in our spare time -- as you can probably tell we don't have a lot of surface area to deal with a large number of PRs!

naresh-bachwani commented 5 years ago

Hello @bbengfort, Thank you for the guidance and clearing the doubts. I have completed building a simple base class for effect plot and we can work through hyperparameter setup on GITHUBgist once my two PRs get shipped!