aerdem4 / lofo-importance

Leave One Feature Out Importance
MIT License
816 stars 84 forks source link

usage question #35

Closed skadio closed 3 years ago

skadio commented 4 years ago

This is not an issue, but rather a quick question for clarification.

From the brief definition of the method, it is a little hard to tell how LOFO and RFE/Backward Selection differ from each other. Could you please compare & contrast?

Thank you again for sharing this lib with the community!

Serdar

aerdem4 commented 4 years ago

Hi @skadio, thanks for the question. Since there may be different implementations of RFE, I will compare with sklearn implementation: 1- LOFO allows you to pick your own validation scheme. You can have time split, groupkfold or even a custom split. RFE ignores the validation set performance, hence the generalization. LOFO can tell you not only important features but also harmful features which cause overfitting. 2- Similarly, your model doesn't have to have feature importance attribute like sklearn RFE wants. Also you can pick any evaluation metric you want. 3- LOFO allows you to group features. Let's say you have 900 features representing cell responses to a drug. You can group them into one.

So this package doesn't invent anything new but try to encapsulate the best practice to get realistic feature importance. Please let me know if you have further questions.

skadio commented 3 years ago

exactly the information I was looking for! Thank you @aerdem4 for the details, this quite helps.

Generalization is the key here then (hence custom validation/evaluation). And the grouping idea.. I haven't seen it supported before --very neat!

Please feel free to mark this issue close/resolve and thanks again for the prompt reply!