The Rank2D visualizer is a feature analysis visualizer that ranks pairwise joint plots of feature columns together (similar to a SPLOM) using a metric in the space [-1, 1] or [0, 1]. The rankings are visualized by a heatmap with only the lower left triangle visible and a diverging or sequential color map scheme that shows the relative ranks of pairs of features.
By using different ranking metrics (Pearson, Covariance, etc), data scientists can detect issues in dependent variables that might impact machine learning - for example covariance, entropy, non-uniformity etc.
See #6 for more
Note to contributors: items in the below checklist don't need to be completed in a single PR; if you see one that catches your eye, feel to pick it off the list!
The following ranking metrics should be added:
[x] Pearson correlation
[x] Covariance
[x] Spearman correlation
[x] Kendall Tau correlation
[ ] mutual-info classification
[ ] mutual-info regression
[ ] Least Squares Error
[ ] Quadracity
[ ] Density based outlier detection
[ ] Uniformity (entropy of grids)
[ ] Number of items in most dense region
See: Seo, Jinwook, and Ben Shneiderman. "A rank-by-feature framework for interactive exploration of multidimensional data." Information visualization 4.2 (2005): 96-113.
The following visual improvements need to be made:
[ ] Make the colobar smaller and nicer
[ ] Add xlabels, ylabels and ticks that are nicely spaced
The Rank2D visualizer is a feature analysis visualizer that ranks pairwise joint plots of feature columns together (similar to a SPLOM) using a metric in the space [-1, 1] or [0, 1]. The rankings are visualized by a heatmap with only the lower left triangle visible and a diverging or sequential color map scheme that shows the relative ranks of pairs of features.
By using different ranking metrics (Pearson, Covariance, etc), data scientists can detect issues in dependent variables that might impact machine learning - for example covariance, entropy, non-uniformity etc.
See #6 for more
Note to contributors: items in the below checklist don't need to be completed in a single PR; if you see one that catches your eye, feel to pick it off the list!
The following ranking metrics should be added:
See: Seo, Jinwook, and Ben Shneiderman. "A rank-by-feature framework for interactive exploration of multidimensional data." Information visualization 4.2 (2005): 96-113.
The following visual improvements need to be made:
See: https://github.com/mwaskom/seaborn/blob/master/seaborn/matrix.py#L94
and
https://stanford.edu/~mwaskom/software/seaborn/examples/many_pairwise_correlations.html
NOTE: New correlation metrics should also be considered to add to
JointPlot
visualizer. See #721 for more details.