cvxgrp / pymde

Minimum-distortion embedding with PyTorch
https://pymde.org
Apache License 2.0
526 stars 27 forks source link

scikit-learn compatible API #67

Open koaning opened 1 year ago

koaning commented 1 year ago

Is there a reason why the library doesn't offer a scikit-learn compatbile API? A class that can work via the fit_transform() API?

akshayka commented 1 year ago

Hi! Thanks for raising this issue, and sorry for the delay in my response.

I'm happy to consider adding an API that's compatible with scikit-learn.

I'm assuming you're talking about scikit-learn's estimator and transform APIs (fit, transform, and fit_transform).

Off the top of my head:

We could have versions of preserve_neighbors and preserve_distances that implemented this API. That makes sense to me, because these functions take raw vector data and preprocess it (conceptually, fit). The transform method would actually compute the embedding.

Would that be helpful?

koaning commented 1 year ago

I'm assuming you're talking about scikit-learn's estimator and transform APIs (fit, transform, and fit_transform).

Yep! That's the one! I'm interested in such an API because it might help users in my bulk labelling interface.

In terms of implementation, maybe the neatest way is to add a class, maybe something like:

import pymde
from pymde import PyMDE

component = PyMDE(method="preserve_neighbors", constraint=pymde.Standardized())

If you want to go the extra mile, I may even go as far as having a constraint-parameter as a string and allowing keyword arguments to pass through. That way, if folks want to use GridSearchCV they can still get nice output. Strings/numbers work a bit better in summary tables than Python objects. But I think just having a scikit-learn compatible class, even if it's just using standard parameters, will also go a long way to have more people try out your library.

koaning commented 1 year ago

ps. I'm also a huge fan of cvxpy by the way!

akshayka commented 1 year ago

Okay, great! I'd love for PyMDE to be useful for bulk, which looks awesome, by the way.

Thanks for the code snippet --- something like that could definitely work. I'll put something together in the coming weeks.