Closed ego-thales closed 1 year ago
May I ask which use case do you have in mind for this functionality?
If one only has a unique template, then FisherRaoElasticRegistration().fit(template)
works fine.
But it is not the case if one uses multiple templates. A use case might be for example having a family of samples X
, and a transformation t -> X(t)
which could be a small deformation, and you could imagine wanting to observe the overall influence of t
on the amplitude of X
. Just an idea, there might be lots of others.
Sorry, maybe I have not been clear. I am asking what would be the purpose of having a call to fit()
without data. If you want to just transform one group of curves, you can use fit_transform()
. If you have a train/test partition you have data to be used in fit()
(even if the call does nothing). Am I misunderstanding something?
The main point is syntactical. The word "fit" makes the user believe that something is happening when it's not. So it would seem unnatural / misleading (to me), for example, to write
>>> # Call `fit_transform` everytime
>>> op = FisherRaoElasticRegistration(template=template)
>>> for t in range(N):
... transformed = op.fit_transform(X[t])
... # Do something
or even
>>> # Call `fit` once on random data
>>> op = FisherRaoElasticRegistration(template=template)
>>> op.fit(template_or_whatever) # Does nothing with `template_or_whatever` since `op` already has `self.template_`
>>> for t in range(N):
... transformed = op.transform(X[t])
... # Do something
Instead, I think the user would expect the following syntax to be working, acknowledging that the fit()
call is necessary for API coherence:
>>> # More natural syntax
>>> op = FisherRaoElasticRegistration(template=template).fit()
>>> for t in range(N):
... transformed = op.transform(X[t])
... # Do something
This is certainly how I encountered the problem the first time: trying the third syntax and getting errors. I thought "wait, does FisherRaoElasticRegistration
not do what I thought it would do? What does op.fit(whatever)
do with whatever
?". Because of this, I had to go back to docs, and then to source code to verify that fit
actually behaved like I expected, and didn't "logically" need any data.
Finally, I think it improves conciseness:
>>> op = FisherRaoElasticRegistration(template=template)
>>> op.fit(template_or_whatever)
takes 2 lines, and
>>> op = FisherRaoElasticRegistration(template=template).fit(template_or_whatever)
looks redundant while probably often going past 80 characters with indentation.
Thanks for your answers by the way, and more broadly, for this package ♥️
I see your concerns. It seems that when you pass a template, FisherRaoElasticRegistration
becomes a stateless transformer. In that case, and following scikit-learn, we should probably allow to NOT call fit
(but be able to do so for additional validation). This requires that in that case we compute some things in transform (without storing them) if they were not previously stored in a call to fit
. Note that scikit-learn still recommends to use fit_transform
in that case.
Although the changes are likely to be small, I would need a test that checks this case of not calling fit
, to prevent future regressions.
I prefer not calling fit
(rather than calling it without parameters). The later pattern is not used in scikit-learn and would raise more questions. Moreover, from a typing perspective, it would allow to pass None
as X
, which is something that we do not want.
As an additional warning, note that if we do this, the template_
attribute will not be created if you do not call fit
.
Also: skip the loop, you can call transform
using all the data at once (and it will be faster).
Thanks for your inputs.
I agree that allowing None
isn't ideal, so I think I also prefer being able to call transform
(and fit_transform
) without calling fit
before. But still, fit
is useful when users don't have template
precomputed, so should it really be stateless?
I guess that in this case, much more needs to be done in __init__
since no new attribute is supposed to be added in fit
according to scikit-learn.
Finally, the example I gave could not really skip the loop since for all we know, X[t+1]
might depend on X[t]
.
I agree that allowing
None
isn't ideal, so I think I also prefer being able to calltransform
(andfit_transform
) without callingfit
before. But still,fit
is useful when users don't havetemplate
precomputed, so should it really be stateless?
It would be stateless just in that case. For a non-precomputed template you will be required to call fit
.
I guess that in this case, much more needs to be done in
__init__
since no new attribute is supposed to be added infit
according to scikit-learn.
Doing work in __init__
is against scikit-learn API. For the stateless case, work has to be done in transform
, without storing attributes. If you want to store the attributes you need to call fit
or fit_transform
.
Finally, the example I gave could not really skip the loop since for all we know,
X[t+1]
might depend onX[t]
.
How? I do not understand you here.
Doing work in
__init__
is against scikit-learn API. For the stateless case, work has to be done intransform
, without storing attributes. If you want to store the attributes you need to callfit
orfit_transform
.
Ok, it makes sense. I may get around updating the PR in accordance to our discussion if I have the time.
Finally, the example I gave could not really skip the loop since for all we know,
X[t+1]
might depend onX[t]
.How? I do not understand you here.
Well, you can't rule out this possibility in general. The user might want to use X[t]
info to determine what X[t+1]
might be. Or you could imagine a while loop instead of a for loop, continually updating X
based on its previous state. It is not important here anyway, it was just a side note on possible usage.
Thanks for your time. All the best!
I suppose you did not think that it is worthy enough at the end?
Sorry, I should have closed with a comment! I actually think it would be nice to implement the way we discussed, but I currently don't have enough time to get around doing this (as you saw, I only made the minor doc modif last week as opposed to last month haha). Since it is not really a big issue, I didn't want to leave an open issue for your project that I wasn't going to fix!
Enables using
FisheRaoElasticRegistration(template=template).fit()
without anyX
data if atemplate
was provided.