Closed bbengfort closed 5 years ago
Thanks, @bbengfort for summarizing this. This makes things simplified for me.
This might also be useful for #889
Just a note that this issue would have the potential to close (or at least address portions of) a lot of existing issues:
This was finished #930 and #937
One of the basic high-dimensional visualization techniques that Yellowbrick makes use of is to decompose or project a high dimensional space into 2 or 3 dimensions to display the data as a scatter plot. Projections of this kind reduce the amount of space between points (decreasing sparsity) but can still give us some intuition of structures in the higher dimensionality. Currently, we have three primary decomposition methods that use this technique:
sklearn.manifold
to produce embeddingsThese visualizers have a lot of shared functionality that can be combined to streamline these kinds of visualizations and make it easier to extend them (e.g. to add ICA, Fast PCA, etc. to the PCA decompositions, or to extend the text visualizers to use the manifold visualizations).
I propose we create a
ProjectionVisualizer
base class or mixin that knows how to:X
intoX'
of shape(n_instances, 2)
or(n_instances, 3)
X
for the projectionThis shared functionality could then be easily used by PCA, Manifold, etc.
The following notes about the class hierarchy:
MultiFeatureVisualizer
produces aself.features_
attribute onfit()
which is useful in PCA for biplots and to understand the original feature set.DataVisualizer
producesself.classes_
from y and is supposed to "provide helper functionality related to target identification" but does not currently implement this yet (it is implemented onManifold
)yellowbrick.contrib.ScatterVisualizer
might be valuable to be moved toyellowbrick.draw.scatter
and use as a mixin to handle part of these cases; though I don't necessarily want to confuse things too much.JointPlot
visualizer would also benefit from the target color handling things from above.This implies that the
ProjectionVisualizer
is aDataVisualizer
and that theDataVisualizer
needs to be updated to handle the target identification stuff that is inManifold
. It also implies thatJointPlot
should be aDataVisualizer
as well.More investigation on this topic is necessary, but I wanted to propose this solution to allow for further discussion by @DistrictDataLabs/team-oz-maintainers and @naresh-bachwani who is working on PCA this summer.