Closed ivanpanshin closed 2 years ago
The shadow features should not be included in the final dataset as they are just random features used for selecting the which features are important.
Then why don't you filter them out by default by the end of the training?
That is what should have done, but I forgot to implement it. Thanks, for spotting this bug.
On Mon 18 Oct 2021, 07:48 Ivan Panshin, @.***> wrote:
Then why don't you filter them out by default by the end of the training?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Ekeany/Boruta-Shap/issues/72#issuecomment-945415278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMDEERS5FVWPMT5VURVRLETUHO7KVANCNFSM5FMGJQ4A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Okay, great :) Then I guess you should also fix a number of selected/accepted features that BORUTA returns since sometimes they include shadow features
Thanks!
On Mon 18 Oct 2021, 22:06 Ivan Panshin, @.***> wrote:
Okay, great :) Then I guess you should also fix a number of selected/accepted features that BORUTA returns since sometimes they include shadow features
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Ekeany/Boruta-Shap/issues/72#issuecomment-946165796, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMDEERVPASFZNUUAQLQE7G3UHSD6XANCNFSM5FMGJQ4A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
In the end we get a DataFrame with features and their importance + number of selected features. Why do we still have shadow features at this point? Shouldn't we eliminate them?
Also, suppose Boruta says that we should select top-10 features. If some shadow_features are in top-10, they are also counted, right?