Closed andrewheusser closed 7 years ago
@andrewheusser can you clarify what help you're looking for here?
essentially, we would extend the procrustes
and reduce
(and maybe align?
) APIs to return a "fit model". Scikit-learn is set up like this:
m = model()
m.fit(data)
transformed_data = m.transform(data)
#or
transformed_data = m.fit_transform(data)
This allows you to pass new data to the model fit with another dataset. Allowing behavior like this would help us in cases where we want to fit data to a precomputed model for cross-validation, or other purposes.
one idea would be to keep the API as we have it currently, but extend its functionality:
from hypertools import tools
reduced_data = tools.reduce(data) # same as before
fit_model = tools.reduce.fit(data)
reduced_data = fit_model.transform(data)
That sounds great to me!
now that I'm in the weeds here, this is actually trickier than I thought. It looks like all of the scikit-learn decomposition algorithms (PCA, FastICA, NMF..) have a fit
, transform
and fit_transform
method. However, the manifold learning algorithms (TSNE, MDS..) have just the fit
and fit_transform
methods (not transform
alone). Thus, the transform
method would only work for the decomposition style algorithms.
perhaps we could provide a standard interface for these functions, even if scikit-learn doesn't. this could be really useful. what i'm thinking is:
reduced = hyp.tools.reduce(data, method='PCA', ndims=3)
returns the reduced data. method can be one of: PCA, PPCA, ICA, NFM, MDS, or tSNE
xform = hyp.tools.reduce(data, method='PCA', ndims=3, return_xform=True)
returns a transform object, fit using data
, that can be applied to any new dataset of the same shape as data
(if data is a single matrix/dataframe) or of the same shape as any element of data
(if data is a list of arrays/dataframes).
then, given xform
, we could get the reduced data using:
reduced = xform.apply(new_data)
, where new_data
could be a list of matrices, a single matrix, etc.
we will probably need to manually define all of these functions (e.g. we can't just use a common interface to scikit-learn), since it sounds like they're all implemented differently. we may also need to find other existing libraries that provide these algorithms and/or implement some ourselves.
benefits of this design:
other considerations:
align=True
flag to hyperalign the result after applying the transform. If the user calls xform = hyp.tools.reduce(data, method='PCA', ndims=3, return_xform=True, align=True)
, the xform.apply
method should also hyperalign any new data into the same space as the original data. this will facilitate analyses like variants of the decoding analysis we used in the paper.ah...i think there's some mixup in these comments with issue 106. this issue is about hyperalignment; 106 is about data reduction.
This issue is now redundant with this one. Closing...
since many are familiar with the scikit-learn fit-transform design, we could change the procrustes and reduce apis to have a similar design. This would also allow transforms to be fit on one dataset and applied to another.