Ekeany / Boruta-Shap

A Tree based feature selection tool which combines both the Boruta feature selection algorithm with shapley values.
MIT License
591 stars 88 forks source link

[BUG] check_missing_values() for methods that can handle missing data is faulty #59

Closed rahimai closed 3 years ago

rahimai commented 3 years ago

Description

in the function check_missing_values() the check for models that can handle missing values is faulty. Simply comparing against the set models_to_check is wrong. This leads to the second if statement never getting executed even if for example an instance XGBRegressor() that can handle missing data is passed.

models_to_check = ('xgb', 'catboost', 'lgbm', 'lightgbm')
model_name = str(type(self.model)).lower()        

if X_missing or Y_missing:
    if any([x in model_name for x in models_to_check]):                
        print('Warning there are missing values in your data !')
    else:                
         raise ValueError('There are missing values in your Data')

Reasoning

str(type(self.model)).lower() returns the instance type which for the XGBRegressor() is <class 'xgboost.sklearn.xgbregressor'>. Comparing this against the models_to_check set will result in a False. Consequently, the second if statement will not get executed.

Ekeany commented 3 years ago

Hi Rahimai,

Thanks for the issue I am not sure if I understand the problem, so please correct me if I am wrong. When running the following code snippet I get the outcome I expect. Am I missing something ?

image

image

rahimai commented 3 years ago

Hi Ekeany,

thanks for the quick response and sorry for the delay on my end - I have been a bit busy over the last few days.

Rechecked the files. It seems like the source file used for the pip package is different which leads to the ValueError being raised. There, the check is as follows

borutashap

The version on the github repo works fine. It would be great if you could update the pip package. I have fixed it internally for myself anyways, but just so that the pip package reflects the up-to-date version of the code for others.

Ekeany commented 3 years ago

Hi Lawan,

No worries, I have updated the pypi repository with the latest version.

Thanks, Eoghan.

On Tue, Jun 22, 2021 at 10:43 AM Lawan Rahim @.***> wrote:

Hi Ekeany,

thanks for the quick response and sorry for the delay on my end - I have been a bit busy over the last few days.

Rechecked the files. It seems like the source file used for the pip package is different which leads to the ValueError being raised. There, the check is as follows

` model_name = str(type(self.model)).lower() if X_missing or Y_missing:

    if model_name.startswith(models_to_check):
        print('Warning there are missing values in your data !')

    else:
        raise ValueError('There are missing values in your Data')`

The version on the github repo works fine. It would be great if you could update the pip package. I have fixed it internally for myself anyways, but just so that the pip package reflects the up-to-date version of the code for others.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Ekeany/Boruta-Shap/issues/59#issuecomment-865833108, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMDEERRKWQGQCKFVEM2UJFLTUBLLDANCNFSM457H6TDQ .