ersilia-os / chempfn

Ensemble-based, size-agnostic wrapper for the TabPFN classifier
GNU General Public License v3.0
28 stars 0 forks source link

Add Feature Sub sampling #3

Closed DhanshreeA closed 1 year ago

DhanshreeA commented 1 year ago

One of TabPFN's limitations is that it is not capable of handling more than 100 features in a data set. Our goal is to figure out some feature sub sampling strategy such that TabPFN's performance guarantees are still maintained. We are incorporating the following approaches:

We can also parameterize this and potentially use something like "mix" for multiple sampling approaches in a single run. Parametrization might also help with grid search or other forms of hyper-parameter tuning approaches.