Open german1608 opened 2 years ago
errata: idk why it was not failing before, but now I get this exception when setting categorical_crossentropy:
ValueError: loss=categorical_crossentropy but model compiled with binary_crossentropy. Data may not match loss function!
Which makes sense. Still my proposal holds. My solution was to subclass kerasclassifier and add a custom target_encoder that always "uses" categorical_crossentropy.
Thank you for the detailed issue report.
Currently the transformers are initialized and fit before the model is created, so there's no introspection possible:
If we switched the order, the model building function won't have access to certain metadata which is pretty useful for dynamically creating models:
But since I wanted to make my model as plug-and-play as possible, I moved the loss setting inside the model function.
So your goal is to have automatically choose a loss based on the input data, right? Currently it works the other way around: you can hardcode the loss to "categorical_crossentropy"
and the input will automatically get one-hot encoded
So your goal is to have automatically choose a loss based on the input data, right?
Based on the output
data, actually. That would work.
NOTE: Don't feel that I'm imposing this. I'm just raising something that caught my attention. Perhaps there is another solution than automatically setting the loss
based on the output dimensions.
Based on the output data, actually.
yup sorry bad wording on my point, I'm referring to y
which is the output
to the model but also an input
in the Python function argument sense...
Is there a problem with the loss always being "categorical_crossentropy"
and y
being encoded to match? IIRC that's what scikit-learns MLPClassifier does. I guess a small performance hit?
Is there a problem with the loss always being "categorical_crossentropy" and y being encoded to match? IIRC that's what scikit-learns MLPClassifier does. I guess a small performance hit?
Even for binary classification? Would that affect how the target_encoder is initialized for binary classification?
I think it should still work for binary classification, yes.
But I'm looking at the MLPClassifier notebook/guide again. It is already dynamically setting the loss function. It uses "sparse_categorical_crossentropy"
for multi class targets so that they do not need to be one-hot encoded (and thus the transformer doesn't need to know about the model's loss function at all). Could you do that instead or do you need to use "categorical_crossentropy"
for multi class targets?
I would test soon and give you feedback. Thanks for your suggestions
Scikeras version:
0.8.0
(I feel) related to #206
I was following the MLPClassifier tutorial on the wiki page. It was great that the model function could handle binary and multi-class classification. However, I encountered this issue while executing the test.:
My
y
has6
classes. I'm using directlyKerasClassifier
i.e. no sub-classing. This is how I was creating the classifierInitially, I was passing the
loss
on theKerasClassifier
parameter, and it was training fine. But since I wanted to make my model as plug-and-play as possible, I moved the loss setting inside the model function. This is where the exception was starting to show up. I took a look at how scikeras initializes the target encoder:https://github.com/adriangb/scikeras/blob/d50e75a90c7ac0966d8583ef487bb7e9fed656c6/scikeras/wrappers.py#L1395-L1415
https://github.com/adriangb/scikeras/blob/d50e75a90c7ac0966d8583ef487bb7e9fed656c6/scikeras/utils/transformers.py#L154-L175
Before it was using one-hot encoding because I was passing
loss='categorical_crossentropy'
to KerasClassifier.What ended up working for me was to still use
loss='categorical_crossentropy'
. It looks like it doesn't affect scores by using sklearnscross_validate
(correct me if I'm wrong), and also it doesn't affect that the target_encoder would use ordinal encoding. The drawback of this solution is that it doesn't look suitable and may confuse new-comers.Other solutions that I thought to solve my particular problem were:
To finally solve this issue, I propose to extract the
loss
(and perhaps theoptimizer
?) from the model, I suppose around these lines (I don't have any experience on this repository)https://github.com/adriangb/scikeras/blob/d50e75a90c7ac0966d8583ef487bb7e9fed656c6/scikeras/wrappers.py#L897-L901