JakeColtman / bartpy

Bayesian Additive Regression Trees For Python
https://jakecoltman.github.io/bartpy/
MIT License
216 stars 44 forks source link

Support feature importance / variable selection #26

Open JakeColtman opened 5 years ago

JakeColtman commented 5 years ago

In many real world use cases, it's important to be able to identity truly important features.

Implementing some of the approaches of https://repository.upenn.edu/cgi/viewcontent.cgi?article=1555&context=statistics_papers seems like a good start.

A side constraint is that the solution should be able to scale to large datasets, which might pose a problem for the permutation approach. Possibly it would be useful to have two different modes - a fully principled one and a rough and ready one for large data sets.

JakeColtman commented 5 years ago

Given the claims in the paper, it would be interesting for the solution to be general enough that it could be applied to implementations of models like RF in other libraries