Closed NealHumphrey closed 7 years ago
@rebeccabilbro another option for our Visualizer this sprint.
@NealHumphrey this sounds a bit like the one we have here. Would this be a subclass of ClassificationReport
?
Re: for our Visualizer for this sprint, I'd like us to try to stick to user-driven design. We want to look for blog posts, articles, etc where machine learning practitioners are using visualizations to guide their model selection/evaluation process. Then we can think about how to implement those visualizations in the Yellowbrick API. The advantage of doing it this way is that we can address a specific use case, which is helpful for scoping purposes. Whatever preliminary implementation we build can be generalized in later sprints.
Let me know if you find any interesting posts or articles that you think might work!
Just wanted to add a couple of things to this issue, but don't want to get in the way of your conversation regarding this Sprint (so feel free to ignore the following):
Question: how do we add k-fold cross-validation or something like cross_val_predict
to the mix?
A few related posts I found:
@NealHumphrey thoughts?
The EnsembleMatrix is an interesting extension of this, and a great example of the visual steering. Should definitely dig into that more.
To start with it probably makes sense to just do this as the basic version, similar to the sklearn example Ben linked. What about:
Down the line, consider something like the ensemble version, realizing that is for a more niche application.
@rebeccabilbro re: your comment on basing this on existing articles/blog etc for this sprint, I did suggest this based on finding it in existing published approaches. Without an internet connection while traveling I couldn't search for blogs, so I pulled this from the Python Data Science Handbook - you can see the example from the book at the end of this Jupyter Notebook. With the user study example I picked I also had a multi-class classification and this visualization ended up being one that I wanted to have during my analysis, as well.
Not sure if this makes sense as a subclass of ClassificationReport? I'd need to dig into the code a bit.
@NealHumphrey gotcha - thanks for sending the example from the Python Data Science Handbook. I was having trouble picturing what you had in mind! It sounds like you'd like us to reimplement the Seaborn heatmap as a Yellowbrick ConfusionMatrixVisualizer
so that it can be used within the Yellowbrick API for Scikit-Learn confusion matrices?
Note: I think seeing your user study will help me to understand the specific requirements for this a bit better. Can you point me to which multi-class dataset you're working with, and explain the problems you encountered with interpreting the Sklearn confusion matrix? Is your notebook at a point where you can do a pull request?
The next step will be to sketch out how this will work with Yellowbrick's API. When you have a chance (and a steady internet connection), take a look at
_HeatMapper
confusion_matrix
ClassificationScoreVisualizer
class, which the ConfusionMatrixVisualizer
will inherit fromThen let's chat!
Pulling into ready; sounds like this is the card @NealHumphrey and I will be working on for #125
See #144
See #144
Closed in #144
Here is the info about the packaging I have done to make confusion matrix easy to generate: https://fraka6.blogspot.com/2013/05/generating-confusion-matrix-great.html
Create a ScoreVisualizer that creates a heatmap color coded confusion matrix showing true label vs. predicted label for multi-class classifiers. Goal of heatmap is to better identify frequently-confused categories.