ContextLab / hypertools

A Python toolbox for gaining geometric insights into high-dimensional data
http://hypertools.readthedocs.io/en/latest/
MIT License
1.81k stars 161 forks source link

Allow option to use DataGeometry objects à la scikit-learn pipelines #227

Open paxtonfitzpatrick opened 4 years ago

paxtonfitzpatrick commented 4 years ago

Currently, if you want to repeatedly transform text samples with hypertools.tools.format_data() using the same parameters, the function re-fits both the vectorizer and text model on each call. This ends up being fairly inefficient, and for expensive/numerous operations, makes working directly with the underlying sklearn classes the better option.

We could add an argument to return the fit models for reuse, but a really nice feature would be something like a scikit-learn Pipeline object that you could create, fit, save, and reuse to perform various processing steps with a single call. This would also be a very attractive feature for hypertools, since it could also additionally implement methods like .plot() and .describe().