DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.29k stars 559 forks source link

Improved Rank2D (implement other metrics) #68

Open bbengfort opened 8 years ago

bbengfort commented 8 years ago

The Rank2D visualizer is a feature analysis visualizer that ranks pairwise joint plots of feature columns together (similar to a SPLOM) using a metric in the space [-1, 1] or [0, 1]. The rankings are visualized by a heatmap with only the lower left triangle visible and a diverging or sequential color map scheme that shows the relative ranks of pairs of features.

By using different ranking metrics (Pearson, Covariance, etc), data scientists can detect issues in dependent variables that might impact machine learning - for example covariance, entropy, non-uniformity etc.

See #6 for more

Note to contributors: items in the below checklist don't need to be completed in a single PR; if you see one that catches your eye, feel to pick it off the list!

The following ranking metrics should be added:

See: Seo, Jinwook, and Ben Shneiderman. "A rank-by-feature framework for interactive exploration of multidimensional data." Information visualization 4.2 (2005): 96-113.

The following visual improvements need to be made:

See: https://github.com/mwaskom/seaborn/blob/master/seaborn/matrix.py#L94

and

https://stanford.edu/~mwaskom/software/seaborn/examples/many_pairwise_correlations.html

NOTE: New correlation metrics should also be considered to add to JointPlot visualizer. See #721 for more details.

tabishsada commented 6 years ago

I created a pull request (https://github.com/DistrictDataLabs/yellowbrick/pull/429) to add spearman correlation to the list of ranking metrics.

bbengfort commented 5 years ago

645 adds Kendall-Tau