Ekeany / Boruta-Shap

A Tree based feature selection tool which combines both the Boruta feature selection algorithm with shapley values.
MIT License
559 stars 86 forks source link

Sample Weight Support for Regression Problems [ENH] #37

Closed kmedved closed 2 years ago

kmedved commented 3 years ago

Current Situation

Most scikit-learn estimators allow for passing sample weights along with the X and y values as an optional parameter. These weights are often key for regression problems. Currently it seems like Boruta-Shap does not support this (unless I'm missing something).

Enhancement

Add support for sample_weights.

Implementation

Given most scikit-learn compatible regression estimators already allow this, including RF and Xgboost, I think this should be possible to add just by passing a sample_weight parameter to the .fit() call of the relevant estimator. I may be missing some complexity here however. I appreciate the help.

Ekeany commented 3 years ago

Hi,

I think this is a good idea, feel free to make a pull request if you want. I think it may be possible to add a **kwargs list to the sklearn fit() function.

LuiNov commented 3 years ago

Hey @kmedved,

I saw your file. But how do I execute the Boruta function together with my survey weights now ?

Could you like give a short "tutorial" how to run the code?

Thanks, Luise

kmedved commented 3 years ago

Sure - it's very simple. You just add the sample weights to the .fit() call. I've put up an example here on colab:

https://colab.research.google.com/drive/1h6qj4naCkEfgaU_Af3DWaf062JJjHK2A#scrollTo=LOMPgfOsRKu6

So here's a fit call without sample weights:

Feature_Selector.fit(X=X[features],
                     y=y,
                     n_trials=100,
                     sample=False,
                     train_or_test = 'test',
                     normalize=True,
                     verbose=True)

And then here's one with sample weights:

Feature_Selector.fit(X=X[features],
                     y=y,
                     sample_weight = X['sample_weight'],
                     n_trials=100,
                     sample=False,
                     train_or_test = 'test',
                     normalize=True,
                     verbose=True)

Sample weights are part of essentially every sklearn compatible regressor, so this should work with any regressor which supports them.

Ekeany commented 3 years ago

Looks great,

Do you want to make a pull request ? To add it to the main package ?

On Wed 11 Aug 2021, 15:05 kmedved, @.***> wrote:

Sure - it's very simple. You just add the sample weights to the .fit() call. I've put up an example here on colab:

https://colab.research.google.com/drive/1h6qj4naCkEfgaU_Af3DWaf062JJjHK2A#scrollTo=LOMPgfOsRKu6

So here's a fit call without sample weights:

Feature_Selector.fit(X=X[features], y=y, n_trials=100, sample=False, train_or_test = 'test', normalize=True, verbose=True)

And then here's one with sample weights:

Feature_Selector.fit(X=X[features], y=y, sample_weight = X['sample_weight'], n_trials=100, sample=False, train_or_test = 'test', normalize=True, verbose=True)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Ekeany/Boruta-Shap/issues/37#issuecomment-896809423, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMDEERR5YRRIHJQ2J6HO6ILT4JYQNANCNFSM4V2Q3VXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

LuiNov commented 3 years ago

Ok....... I'm so sorry..... I wondered where to find the package.... Then I saw it's python not R..... Do you know about a way to implement this in R-Language? I know that I could switch to Python but I am not familiar with the language....

Ekeany commented 3 years ago

Yeh the package is in python, I honestly don't think there is an equivalent package in R.

Sorry

On Wed 11 Aug 2021, 15:22 LuiNov, @.***> wrote:

Ok....... I'm so sorry..... I wondered where to find the package.... Then I saw it's python not R..... Do you know about a way to implement this in R-Language? I know that I could switch to Python but I am not familiar with the language....

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Ekeany/Boruta-Shap/issues/37#issuecomment-896823346, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMDEERQX4TJ4KSNC62FQWELT4J2SDANCNFSM4V2Q3VXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

LuiNov commented 3 years ago

:-( Ok But thanks again for your help!! I am using it without weights atm. I am comparing three countries from the WVS-Dataset. My weights are correcting for population size, gender, age and regions. For the moment I implemented it without weights. I thought about "cheating" by multiplying my outcome variable with the weights and run on the "Outcome Tilda". Could I do this or is this completely wrong? For sure I won't fit my regression in that way.

kmedved commented 3 years ago

Looks great, Do you want to make a pull request ? To add it to the main package ?

Do you mean something other than: https://github.com/Ekeany/Boruta-Shap/pull/60? Just want to make sure I understand.

Ekeany commented 3 years ago

Oh sorry didn't see that pull request will merge it now thanks again for the help

On Wed 11 Aug 2021, 21:03 kmedved, @.***> wrote:

Looks great, Do you want to make a pull request ? To add it to the main package ?

Do you mean something other than: #60 https://github.com/Ekeany/Boruta-Shap/pull/60? Just want to make sure I understand.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Ekeany/Boruta-Shap/issues/37#issuecomment-897076109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMDEERVCKQCBDTSM27XRLCLT4LCRBANCNFSM4V2Q3VXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

kmedved commented 3 years ago

Thanks @Ekeany