change procrustes and reduce API to look like scikit learn

andrewheusser commented 7 years ago

since many are familiar with the scikit-learn fit-transform design, we could change the procrustes and reduce apis to have a similar design. This would also allow transforms to be fit on one dataset and applied to another.

jeremymanning commented 7 years ago

@andrewheusser can you clarify what help you're looking for here?

andrewheusser commented 7 years ago

essentially, we would extend the procrustes and reduce (and maybe align?) APIs to return a "fit model". Scikit-learn is set up like this:

m = model()
m.fit(data)
transformed_data = m.transform(data)
#or
transformed_data = m.fit_transform(data)

This allows you to pass new data to the model fit with another dataset. Allowing behavior like this would help us in cases where we want to fit data to a precomputed model for cross-validation, or other purposes.

one idea would be to keep the API as we have it currently, but extend its functionality:

from hypertools import tools
reduced_data = tools.reduce(data) # same as before
fit_model = tools.reduce.fit(data)
reduced_data = fit_model.transform(data)

jeremymanning commented 7 years ago

That sounds great to me!

andrewheusser commented 7 years ago

now that I'm in the weeds here, this is actually trickier than I thought. It looks like all of the scikit-learn decomposition algorithms (PCA, FastICA, NMF..) have a fit, transform and fit_transform method. However, the manifold learning algorithms (TSNE, MDS..) have just the fit and fit_transform methods (not transform alone). Thus, the transform method would only work for the decomposition style algorithms.

jeremymanning commented 7 years ago

perhaps we could provide a standard interface for these functions, even if scikit-learn doesn't. this could be really useful. what i'm thinking is:

reduced = hyp.tools.reduce(data, method='PCA', ndims=3) returns the reduced data. method can be one of: PCA, PPCA, ICA, NFM, MDS, or tSNE xform = hyp.tools.reduce(data, method='PCA', ndims=3, return_xform=True) returns a transform object, fit using data, that can be applied to any new dataset of the same shape as data (if data is a single matrix/dataframe) or of the same shape as any element of data (if data is a list of arrays/dataframes).

then, given xform, we could get the reduced data using: reduced = xform.apply(new_data), where new_data could be a list of matrices, a single matrix, etc.

we will probably need to manually define all of these functions (e.g. we can't just use a common interface to scikit-learn), since it sounds like they're all implemented differently. we may also need to find other existing libraries that provide these algorithms and/or implement some ourselves.

benefits of this design:

common interface for a wide variety of dimensionality reduction algorithms
ability to either return just the reduced data, or to get a generalizeable transformation object that can be applied to new data as well
seamlessly work with single matrices or lists, and be agnostic to whether the data are in numpy arrays or pandas dataframes. in other words, users throw all of the data into these functions, in a variety of formats, and we flexibly handle those formats.

other considerations:

missing data can be filled in, for all of these functions, using PPCA
following the "simplifying API" strategy, users should also be able to add an align=True flag to hyperalign the result after applying the transform. If the user calls xform = hyp.tools.reduce(data, method='PCA', ndims=3, return_xform=True, align=True), the xform.apply method should also hyperalign any new data into the same space as the original data. this will facilitate analyses like variants of the decoding analysis we used in the paper.

jeremymanning commented 7 years ago

ah...i think there's some mixup in these comments with issue 106. this issue is about hyperalignment; 106 is about data reduction.

jeremymanning commented 7 years ago

This issue is now redundant with this one. Closing...

ContextLab / hypertools

change procrustes and reduce API to look like scikit learn #98