Feature: handler functions should be the default fit/fit_transform/transform/predict

vincent-antaki commented 3 years ago

Is your feature request related to a problem? Please describe.

ExecutionContext instances are an important piece of Neuraxle framework. They keep services, an execution stack and various information about the computation going on. While not all pipeline need them, most elaborate computations benefit from their use because of the amount of customisation and flexibility they allow in the execution of pipeline. Not only that, but I believe they'll play a central role in providing run-time optimisations at some point.

Neuraxle steps have that weird duality where, they have context-less defined behaviour through the fit, fit_transform, transform methods (hereafter FFTT methods) and they have a behaviour for the handler methods (i.e. with context) defined by _fit_data_container, _fit_transform_data_container and _transform_data_container (hereafter _X_data_container methods). Theoretically, a step that defines both FFTT methods and data container methods could have to completely different behaviour whether it's called by handler method or FFTT methods (which itself could depend on a step which wraps it). I believe a user should never have to write both FFTT methods and their data_container counterparts.

As a bridge between these two interfaces (and to avoid writing the code twice), we have the following tools:

BaseStep default _X_data_container methods which simply calls their FFTT counterpart and thus forget about the context. This is ok if the step doesn't have children step but problematic if a step deeper within the pipeline requires the context later on.
ForceHandleMixin which, on call of an FFTT method, creates a new ExecutionContext and pass it to a handler call. This can be problematic because the new context is effectively a bogus one.
HandleOnlyMixin which simply crashes on FFTT method call (which I think is good praxis to use).
- The _CustomHandlerMethods which I'm not sure what is it useful for (and is only used in a single class).

The point I'm getting at is that I believe this duality is problematic. Switching back and forth between these two interfaces lead to unforeseen behaviour, loss of context and, in an attempt to keep the "user-friendly" FFTT interface, makes implementing step within the framework more complex than it should be.

Describe the solution you'd like I feel like it will soon be time to replace the FFTT functions with what are currently the handler functions.

On the downside, such a change would complexify Neuraxle for users which do not use ExecutionContext and DataContainer / expect an interface that matches scikit-learn. Implementing steps will require writing data container methods instead of FFTT which may not be the initial reflex of a new user.

Describe alternatives you've considered We could keep things as they are and have the default FFTT functions behave like their ForceHandleMixin counterparts. This doesn't fully solve the problems I mentioned but it may make this a non-breaking change.

guillaume-chevalier commented 3 years ago

I think this could be changed so that in version 1.0.0 we don't have this duality anymore really.

On the downside, such a change would complexify Neuraxle for users which do not use ExecutionContext and DataContainer / expect an interface that matches scikit-learn. Implementing steps will require writing data container methods instead of FFTT which may not be the initial reflex of a new user.

At least, a mixin (or a direct base class step) could exist to be inherited from to already have the handle_FFTT defined so as to just implement the simple version. So users could still inherit from a single class to do just that if really wanted. And if users don't inherit from this specific mixin, then the regular simple method would simply be nonexistent.

I think that for now we could use some deprecated warnings for overriding the simple/classical methods.

What bothers me most is the breaking change. Especially the confusion if we apply the breaking change FFTT --> deleted and then handle_FFTTT --> FFTT (or so). The signature of the classical FFTT methods would all change.

Well, if the release is big enough (e.g.: like TF v2), then we could do such a change, as well as a maximum of breaking change in one release the day that we'll do this.

I would like the opinion of @jeromebedard12 on this as well.

guillaume-chevalier commented 3 years ago

@vincent-antaki

guillaume-chevalier commented 3 years ago

Also, we could probably rename the terminology of step to component, and wrapper to higher order component in the breaking release ? (only if component is so much more explicit than step, and for the wrapper thing, the word wrapper is already quite clear, but could be more precise with higher order component).

vincent-antaki commented 3 years ago

I have no preference between step and component. As for higher order component, I think it makes less clear then wrapper.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in the next 180 days. Thank you for your contributions.

Neuraxio / Neuraxle

Feature: handler functions should be the default fit/fit_transform/transform/predict #437