PythonOT / POT

POT : Python Optimal Transport
https://PythonOT.github.io/
MIT License
2.38k stars 497 forks source link

(Semi-)Supervised Domain Adaptation for regression problem using POT #348

Open MrPr3ntice opened 2 years ago

MrPr3ntice commented 2 years ago

🚀 Feature

Extension of the methods in ot.da.* for regression problems (by now only classification (?)).

Motivation

I already used ot.da.SinkhornLpl1Transport for domain adaptation in (semi-)supervised classification problems (i.e. in ot.da.SinkhornLpl1Transport.fit(Xs, ys, Xt, yt), where yt contains either the class label (a positive scalar) of a sample or -1 if the label is unknown). The only way I found in order to transfer this method to a (metric) regression problem is to convert the regression problem to a classification problem (e.g. by discretising the metric target value y in e.g. 10 classes). Of course this conversion is not ideal as both the natural order of y and distances between ys get lost in a classification problem.

Pitch

Ideally yt is capable of taking both class labels or metric target values. Samples without a label information are marked with e.g. numpy.nan instead of -1. The decision whether it is a regression or a classification problem is either clarified with an additional parameter, e.g. is_cls=True/False or automatically (harder).

Alternatives

Maybe I am missing something and there is already a possibility for regression problems or it is impossible to implement as OT is not capable of working with yts of metric scale.

Additional context

Nothing to add here.

ncourty commented 2 years ago

Hello @MrPr3ntice . Indeed this extension could be useful. Please note that nothing prevents it in theory. You can take a look at this repo https://github.com/rflamary/JDOT where we use an OT based strategy to perform DA on a regression task.

MrPr3ntice commented 2 years ago

Thanks @ncourty for your insights! Thanks also for mentioning JDOT, which I recognized earlier but when I remember correctly, no (semi-)supervised strategies are supplied (i.e. yt is only used for the validation, not for fitting the data alignment)? I will take a deeper look at the class regularization theory from https://arxiv.org/pdf/1507.00504.pdf (section 4) and maybe will come up with a proposal for the (semi-)supervised regression problem. As you mentioned, there should be nothing preventing this in theory.

ncourty commented 2 years ago

Yes ! also you can take a look at https://arxiv.org/pdf/2202.06208.pdf (this is shameless self-promotion, sorry) for a work (under- review) on a specific type of regularizer for DA in the regression setting. Hopefully, when accepted, we will add it to POT.