Unable to figure out the Target variable for binary class ,

munchcrunch commented 2 years ago

I am applying PiML models to my custom data in Jupyter notebook and receiving the attached error. Please help and guid me in this regard. Thank you

ZebinYang commented 2 years ago

Hi @munchcrunch, it seems that you are using the old version of PiML (0.1.4). Can you try it again using the latest version? I can help debug if you could provide a reproducible colab notebook.

dogefeeder commented 2 years ago

I am applying PiML models to my custom data in Jupyter notebook and receiving the attached error. Please help and guid me in this regard. Thank you

You have to set the name of the target column as 'FlagDefault'.

munchcrunch commented 2 years ago

@ZebinYang I have tried in Colab and now receiving this error. The installed version is piml-0.2.2.

munchcrunch commented 2 years ago

@dogefeeder I tried with this modification and still receiving the same error.

ZebinYang commented 2 years ago

@munchcrunch Can you share the data or the notebook link so that I can do some debugging? My email address is yangzb2010@connect.hku.hk

munchcrunch commented 2 years ago

@ZebinYang I have sent data and notebook via email. Thank you

dogefeeder commented 2 years ago

@ZebinYang I have tried in Colab and now receiving this error. The installed version is piml-0.2.2.

Now you have the same error with mine.

ZebinYang commented 2 years ago

Hi @munchcrunch.

This is due to we treat the last column of data as the target variable by default. If it is identified as "categorical", we will treat it as a classification problem. Here the feature "Posted Speed Limit" has 4 distinct values. Hence it is automatically treated as a multi-classification task, which is not supported yet.

Two methods can avoid this from happening.

Specify the target variable, e.g., by exp.data_prepare(target="0/1"). Not sure if "0/1" is the expected target.
Manually change the feature type of "Posted Speed Limit" to numerical in exp.data_summary(), and then select the true target and task type in exp.data_prepare().

Thanks for pointing out this important issue, and we will improve this module in the next release soon.

munchcrunch commented 2 years ago

Thank you for your prompt response. It works now. I would suggest, please fix the issue of label encoding for categorical variables, instead of converting categorical variables into label encoding. Sometimes it is difficult to interpret the variables with label encoding. In addition, I would also suggest adding a solution to the multi-classification problem in the next release. Thank you.

On Tue, Jul 19, 2022 at 12:39 PM Zebin YANG @.***> wrote:

Hi @munchcrunch https://github.com/munchcrunch.

This is due to we treat the last column of data as the target variable by default. If it is identified as "categorical", we will treat it as a classification problem. Here the feature "Posted Speed Limit" has 4 distinct values. Hence it is automatically treated as a multi-classification task, which is not supported yet.

Two methods can avoid this from happening.

Specify the target variable, e.g., by exp.data_prepare(target="0/1"). Not sure if "0/1" is the expected target.

Manually change the feature type of "Posted Speed Limit" to numerical in exp.data_summary(), and then select the true target and task type in exp.data_prepare().

Thanks for pointing out this important issue, and we will improve this module in the next release soon.

— Reply to this email directly, view it on GitHub https://github.com/SelfExplainML/PiML-Toolbox/issues/9#issuecomment-1189316262, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJQLPX2CABAYXHP5UO37WGTVU3K4BANCNFSM53WPM5KA . You are receiving this because you were mentioned.Message ID: @.***>

SelfExplainML / PiML-Toolbox

Unable to figure out the Target variable for binary class , #9