DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.3k stars 559 forks source link

PrePredict Estimator #1189

Closed bbengfort closed 2 years ago

bbengfort commented 3 years ago

A contrib estimator to wrap inferences made outside of the Yellowbrick workflow. This estimator can wrap predicted data for use in many visualizers (though not all, depending on if the visualizer needs access to learned attributes) or load data from disk or via a callable.

I have made the following changes:

  1. Implemented PrePredict a contrib estimator
  2. Created tests for the PrePredict estimator
  3. Wrote documentation for the PrePredict estimator

Sample Code and Plot

Please see the documentation for sample code.

Questions for the @DistrictDataLabs/team-oz-maintainers:

CHECKLIST

codecov[bot] commented 3 years ago

Codecov Report

Merging #1189 (497c15f) into develop (df45161) will increase coverage by 0.03%. The diff coverage is 96.55%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1189      +/-   ##
===========================================
+ Coverage    90.42%   90.46%   +0.03%     
===========================================
  Files           90       91       +1     
  Lines         5097     5126      +29     
===========================================
+ Hits          4609     4637      +28     
- Misses         488      489       +1     
Impacted Files Coverage Δ
yellowbrick/contrib/prepredict.py 96.55% <96.55%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update df45161...497c15f. Read the comment docs.

bbengfort commented 3 years ago

Tests passed *whew!

bbengfort commented 2 years ago

@lwgray thanks for the review! It's been so long since I did this that I don't really remember all the details, but I'll try to answer:

  1. I didn't include workable code since we didn't have a dataset that might be well suited to the documentation. In the tests, I used a generated data set, but I don't think that really helps with understanding how this works, since really this is just a tool that gets you to another Visualizer.
  2. Yeh, the contrib module is supposed to be for third party or not fully fledged out code; it's kind of an auxiliary so it doesn't necessarily follow the same rules that the other YB visualizers do.
  3. As I mentioned, the score method was really just to get visualizers to work with this data; I could see the scoring method being a useful argument; I recommend that we open up a story to add that feature in the future.