Recursive feature selection

MilesCranmer commented 3 years ago

I think something that may give a large algorithmic performance is recursive feature selection. Here is the idea:

Hold "next-mutation" feature importances for every equation separately.
Start these at uniform importances.
Randomly choose to update the feature importances for an equation each iteration, with low probability.
To calculate the next-mutation feature importances, get the residual between the current equation's prediction and the target dataset. Use XGBoost.jl to predict the residual using the features. Use the trained XGBoost model to determine feature importance.
Every time one performs a mutation that involves selecting a new feature, the random selection would be weighted by the equation's current recorded feature importances. I think this will help the model when there are a large number of features.

On a more general note, I think it could be interesting to look at the gradients of an equation with respect to each subtree. Perhaps that could provide information on which subtree to mutate next, and with what operation.

verdverm commented 3 years ago

You might find these links interesting to improving SR (they are my own research)

The second has my original paper which has some optimizations for reducing the search space, applicable to both GP & PGE based SR.

I have a hunch that RL + PGE could produce some better decision making.

MilesCranmer commented 3 years ago

Thanks Tony!! I've actually read your PGE paper several times over the years and really like the approach! PyPGE was one of the packages I tried when I was searching for an alternative to Eureqa. Would be awesome to have the algorithm integrated with the API in PySR somehow, even if it's just re-using the pysr(...) API to call different algorithms.

PySR's internal algorithm right now is a fairly simple regularized evolution search (with annealing + separate constant optimization), so I would also be really interested to introduce some internal hook to call more complex algorithms as types of "mutations" inside the internal loop!

verdverm commented 3 years ago

You might look into the island model, the pareto selection optimizations, and find the highest cycle RNG you can. They all help with premature optimization, the biggest problem with GP.

Is there a non-linear regression package in Python?

MilesCranmer / PySR

Recursive feature selection #15