Target transformations/inverse transformations

ablaom commented 4 years ago

This looks very nice.

Could you explain wow does one handle learned target transformations/inverse transformations with this API? For example, suppose I want to apply ridge regression, but I first want to normalise the target, and the final predictions should be inverse transformed using the scale and shift learned.

alegonz commented 4 years ago

@ablaom Thank you for your interest!

Your questions in this issue and the other two are very good, and I'll need a bit of time to write an adequate answer. Please allow me some time to get back to you (most likely during this coming weekend).

alegonz commented 4 years ago

@ablaom

I think you are aware of this, but just in case, to avoid confusion: baikal does not do any learning/optimization/backpropagation of parameters. Parameters are learned by their respective steps on their own "local" fitting. What you can tune jointly with baikal is hyper-parameters.

With that out of the way, the API should let you learn target transformations if you treat them as hyper-parameters. The target transformation would be handled by a transformer step applied on the targets, and another step that does the inverse transformation on the predictions downstream in the pipeline. The knobs of those steps would be tuned via, say, grid search, so you could, for example, search for the best pair out of a set of log/exp, boxcox/inv_boxcox, etc and/or their hyperparameters.

ablaom commented 4 years ago

I think you are aware of this, but just in case, to avoid confusion: baikal does not do any learning/optimization/backpropagation of parameters. Parameters are learned by their respective steps on their own "local" fitting. What you can tune jointly with baikal is hyper-parameters.

Yes, thanks for the clarification. I do understand this point. I'm not talking about tuning hyperparameters, just training parameters on training data. Let me clarify my question somewhat. Sorry if I say of lot of things that are obvious to you.

There are two kinds of transformations commonly applied in ML. The first kind is "static", in the sense that the are no parameters to learn (only, possibly, user-specified hyperparameters). For example taking log of the target, re-encoding categorical features as integers is another. Other transformations need to be fit to some data before they can be applied. This includes PCA, but whitening (normalising) a target also falls into this category (and, to prevent data leakage, new data should be whitened using the previously learned parameters, not relearnt from the new data).

My understanding, from this long outstanding issue (see also this unmerged PR) is that the existing scitkit-learn pipelines can only handle static target transformations. The problem, as I roughly understand it, is that the inverse transform at the end of the pipeline needs to apply parameters learned in an early part of the pipeline which the API doesn't allow, because different nodes of the pipeline cannot point to the same learned parameters - or something like this, no?

So, one version of my query is: Does baikal essentially resolve this issue?

alegonz commented 4 years ago

@ablaom Sorry for the late reply.

Thanks for the clarification! I understand now the use case. I was not aware it was such a long standing issue in sklearn. Thank you for the pointers.

The short answer: no, you cannot do that with baikal yet.

To apply the parameters parameters learned earlier in the pipeline in a later step you need to be able to reuse that step (what I call "a shared step", similar to what Keras do with shared layers) on new inputs. Currently calling a step with new inputs will override the connectivity of the first call, so it is not possible yet. You could perhaps hack your way to it by having another (non-trainable) step with pointers to the parameters of the earlier step, but that might end up being unwieldy.

I do believe this should be possible with baikal, though. Actually, I have already pondered about this recently, and I have a note in the docstrings and a TODO for it. The idea is that steps could be called an arbitrary number of times on different inputs with different behaviors at each call (e.g. trainable + transform function in the first call, non-trainable + inverse transform function in the second call). I haven't figured it out completely yet, as there are some edge cases to consider, like how would such steps behave when having a fit_predict method like the one discussed in issue #13 :)

ablaom commented 4 years ago

Thanks again for your thoughtful and thorough response.

In my assessment, the difficulty that many model composition frameworks have with the target transformation use case, and the use cases I brought up in #14 and #12, arise from the following:

(a) The fusion of two or more of the following objects: "nodes", "operations" (eg, transform, inverse_transform), "model hyper parameters", and "model learned parameters". So for example, in target transformations you want two nodes with different operations but which point to the same learned parameters. In a homogeneous ensemble, you want multiple nodes pointing to the same model hyper-parameters.

(b) The conflation of (i) nodes representing the information flow during prediction (production) mode; and (ii) nodes representing locations from which to fetch training data. So, for example, in a stack, you want, for each base learner:

to compute several sets of learned parameters, one set for each fold of the provided data, to obtain the out-of-sample predictions (on different subsets of the data) used to train the adjudicator, but
- to predict using an adjudicator that looks to base model nodes trained on all the data.

In MLJ, a machine learning framework for Julia currently under development, the model composition API keeps the various objects mentioned above separate, and is hence able to treat the aforementioned use cases. As you might expect, it pays a price in higher complexity, manifest in the introduction of new object, there called a machine.

You can learn more about the model composition API from here and some of the tutorials here.

For a stacking example, and a simple illustration of nodes with coupled hyper-parameters, see here.

I hope some of the ideas can be of use in your own commendable project.

Thanks again for you answers.

alegonz / baikal

Target transformations/inverse transformations #11