DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.29k stars 559 forks source link

ProjectionVisualizer: unifying functionality of PCA and Manifold #874

Closed bbengfort closed 5 years ago

bbengfort commented 5 years ago

One of the basic high-dimensional visualization techniques that Yellowbrick makes use of is to decompose or project a high dimensional space into 2 or 3 dimensions to display the data as a scatter plot. Projections of this kind reduce the amount of space between points (decreasing sparsity) but can still give us some intuition of structures in the higher dimensionality. Currently, we have three primary decomposition methods that use this technique:

These visualizers have a lot of shared functionality that can be combined to streamline these kinds of visualizations and make it easier to extend them (e.g. to add ICA, Fast PCA, etc. to the PCA decompositions, or to extend the text visualizers to use the manifold visualizations).

I propose we create a ProjectionVisualizer base class or mixin that knows how to:

This shared functionality could then be easily used by PCA, Manifold, etc.

The following notes about the class hierarchy:

This implies that the ProjectionVisualizer is a DataVisualizer and that the DataVisualizer needs to be updated to handle the target identification stuff that is in Manifold. It also implies that JointPlot should be a DataVisualizer as well.

More investigation on this topic is necessary, but I wanted to propose this solution to allow for further discussion by @DistrictDataLabs/team-oz-maintainers and @naresh-bachwani who is working on PCA this summer.

naresh-bachwani commented 5 years ago

Thanks, @bbengfort for summarizing this. This makes things simplified for me.

bbengfort commented 5 years ago

This might also be useful for #889

rebeccabilbro commented 5 years ago

Just a note that this issue would have the potential to close (or at least address portions of) a lot of existing issues:

bbengfort commented 5 years ago

This was finished #930 and #937