Pass the `loss` from the `compile` call to the target_encoder instantiation

german1608 commented 2 years ago

Scikeras version: 0.8.0

(I feel) related to #206

I was following the MLPClassifier tutorial on the wiki page. It was great that the model function could handle binary and multi-class classification. However, I encountered this issue while executing the test.:

    ValueError: Shapes (None, 1) and (None, 6) are incompatible

My y has 6 classes. I'm using directly KerasClassifier i.e. no sub-classing. This is how I was creating the classifier

def init_model_object(**params) -> Sequential:
    log.info('HP-PARAMS: %s', params)

    def get_clf_model(meta: Dict[str, Any], compile_kwargs: Dict[str, Any]) -> Sequential:
        model = Sequential(name='LSTM-cf-genie')

        model.add(
            layers.ZeroPadding1D(
                padding=3,
                name='zero-padding-layer',
                input_shape=(
                    meta['n_features_in_'],
                    1)))

        model.add(layers.Bidirectional(layers.LSTM(16, name='lstm-layer', return_sequences=True)))

        model.add(layers.LSTM(50, name='lstm-layer-2', return_sequences=False))

        if meta['target_type_'] == 'multiclass':
            n_output_units = meta['n_classes_']
            output_activation = 'softmax'
            loss = 'categorical_crossentropy'
            metrics = ['categorical_accuracy']
        elif meta['target_type_'] == 'binary':
            n_output_units = 1
            output_activation = 'sigmoid'
            loss = 'binary_crossentropy'
            metrics = ['binary_accuracy']
        else:
            raise ValueError('Model does not support target type: ' + meta['target_type_'])

        model.add(layers.Dense(n_output_units, name='output', activation=output_activation))

        model.compile(loss=loss, metrics=metrics, optimizer=compile_kwargs['optimizer'])

        model.summary()
        return model

    clf = KerasClassifier(
        model=get_clf_model,
        epochs=50,
        batch_size=500,
        verbose=1,
        # We have to set this value even for binary classification. Otherwise, the target encoder won't use One hot encoding
        # loss='categorical_crossentropy',
        optimizer='adam',
        optimizer__learning_rate=0.001,
    )
    return clf

Initially, I was passing the loss on the KerasClassifier parameter, and it was training fine. But since I wanted to make my model as plug-and-play as possible, I moved the loss setting inside the model function. This is where the exception was starting to show up. I took a look at how scikeras initializes the target encoder:

https://github.com/adriangb/scikeras/blob/d50e75a90c7ac0966d8583ef487bb7e9fed656c6/scikeras/wrappers.py#L1395-L1415

https://github.com/adriangb/scikeras/blob/d50e75a90c7ac0966d8583ef487bb7e9fed656c6/scikeras/utils/transformers.py#L154-L175

Before it was using one-hot encoding because I was passing loss='categorical_crossentropy' to KerasClassifier.

What ended up working for me was to still use loss='categorical_crossentropy'. It looks like it doesn't affect scores by using sklearns cross_validate (correct me if I'm wrong), and also it doesn't affect that the target_encoder would use ordinal encoding. The drawback of this solution is that it doesn't look suitable and may confuse new-comers.

Other solutions that I thought to solve my particular problem were:

Using a final single output layer for the multi-class classification problem, but in my case, that wasn't working really well...;
One-hot encode even for the binary classification problem, but every tutorial that I found on the internet recommends using a single output node.

To finally solve this issue, I propose to extract the loss (and perhaps the optimizer?) from the model, I suppose around these lines (I don't have any experience on this repository)

https://github.com/adriangb/scikeras/blob/d50e75a90c7ac0966d8583ef487bb7e9fed656c6/scikeras/wrappers.py#L897-L901

german1608 commented 2 years ago

errata: idk why it was not failing before, but now I get this exception when setting categorical_crossentropy:

ValueError: loss=categorical_crossentropy but model compiled with binary_crossentropy. Data may not match loss function!

Which makes sense. Still my proposal holds. My solution was to subclass kerasclassifier and add a custom target_encoder that always "uses" categorical_crossentropy.

adriangb commented 2 years ago

Thank you for the detailed issue report.

Currently the transformers are initialized and fit before the model is created, so there's no introspection possible:

https://github.com/adriangb/scikeras/blob/d50e75a90c7ac0966d8583ef487bb7e9fed656c6/scikeras/wrappers.py#L828-L835

If we switched the order, the model building function won't have access to certain metadata which is pretty useful for dynamically creating models:

https://github.com/adriangb/scikeras/blob/d50e75a90c7ac0966d8583ef487bb7e9fed656c6/scikeras/utils/transformers.py#L306-L311

But since I wanted to make my model as plug-and-play as possible, I moved the loss setting inside the model function.

So your goal is to have automatically choose a loss based on the input data, right? Currently it works the other way around: you can hardcode the loss to "categorical_crossentropy" and the input will automatically get one-hot encoded

german1608 commented 2 years ago

So your goal is to have automatically choose a loss based on the input data, right?

Based on the output data, actually. That would work.

NOTE: Don't feel that I'm imposing this. I'm just raising something that caught my attention. Perhaps there is another solution than automatically setting the loss based on the output dimensions.

adriangb commented 2 years ago

Based on the output data, actually.

yup sorry bad wording on my point, I'm referring to y which is the output to the model but also an input in the Python function argument sense...

Is there a problem with the loss always being "categorical_crossentropy" and y being encoded to match? IIRC that's what scikit-learns MLPClassifier does. I guess a small performance hit?

german1608 commented 2 years ago

Is there a problem with the loss always being "categorical_crossentropy" and y being encoded to match? IIRC that's what scikit-learns MLPClassifier does. I guess a small performance hit?

Even for binary classification? Would that affect how the target_encoder is initialized for binary classification?

adriangb commented 2 years ago

I think it should still work for binary classification, yes.

But I'm looking at the MLPClassifier notebook/guide again. It is already dynamically setting the loss function. It uses "sparse_categorical_crossentropy" for multi class targets so that they do not need to be one-hot encoded (and thus the transformer doesn't need to know about the model's loss function at all). Could you do that instead or do you need to use "categorical_crossentropy" for multi class targets?

german1608 commented 2 years ago

I would test soon and give you feedback. Thanks for your suggestions

adriangb / scikeras

Pass the `loss` from the `compile` call to the target_encoder instantiation #277