Open koaning opened 1 year ago
Hi! Thanks for raising this issue, and sorry for the delay in my response.
I'm happy to consider adding an API that's compatible with scikit-learn.
I'm assuming you're talking about scikit-learn's estimator and transform APIs (fit
, transform
, and fit_transform
).
Off the top of my head:
We could have versions of preserve_neighbors
and preserve_distances
that implemented this API. That makes sense to me, because these functions take raw vector data and preprocess it (conceptually, fit
). The transform
method would actually compute the embedding.
Would that be helpful?
I'm assuming you're talking about scikit-learn's estimator and transform APIs (fit, transform, and fit_transform).
Yep! That's the one! I'm interested in such an API because it might help users in my bulk labelling interface.
In terms of implementation, maybe the neatest way is to add a class, maybe something like:
import pymde
from pymde import PyMDE
component = PyMDE(method="preserve_neighbors", constraint=pymde.Standardized())
If you want to go the extra mile, I may even go as far as having a constraint
-parameter as a string and allowing keyword arguments to pass through. That way, if folks want to use GridSearchCV
they can still get nice output. Strings/numbers work a bit better in summary tables than Python objects. But I think just having a scikit-learn compatible class, even if it's just using standard parameters, will also go a long way to have more people try out your library.
ps. I'm also a huge fan of cvxpy
by the way!
Okay, great! I'd love for PyMDE to be useful for bulk, which looks awesome, by the way.
Thanks for the code snippet --- something like that could definitely work. I'll put something together in the coming weeks.
Is there a reason why the library doesn't offer a scikit-learn compatbile API? A class that can work via the
fit_transform()
API?