cavalab / ellyn

python-wrapped version of ellen, a linear genetic programming system for symbolic regression and classification.
http://cavalab.org/ellyn
Other
54 stars 11 forks source link

Uniqueness of solution #13

Closed h-vijayakumaran closed 2 years ago

h-vijayakumaran commented 2 years ago

Thanks a lot for this python version of ellenGP!

I have been trying to play with a simple synthetic dataset, something as simple in the line of y = polynomial(x). Ideally I would like to see that the equation is exactly recovered, but often this doesn't happen. Is there a way to get ellyn to converge uniquely, atleast in the case of synthetic dataset created from simple closed form expressions ?

As I couldn't resolve my issue about uniqueness, as an academic exercise, I tried to "force" ellyn to give a "unique" solution by controlling the randomness involved in the process. In a first step, I was able to control the train_test_split step happening within ellyn through setting random_state="some-seed-number". However, this by itself isn't sufficient as there is a certain randomness involved in the population generation. Is there a way to freeze or control this randomness?

lacava commented 2 years ago

hi there, apologies for the delay.

there are sifferent ways to restrict the search space, like limiting the operator set, that might help with generating consistent modelsfor different seeds.

for the same seed, performance should be the same if the random state is fixed and you are on a single thread. let me know if this works for you

with multiple threads, ellyn sets a fixed seed for each thread. however it's difficult to guarantee the order of operations since threads finish asynchronously, so you may still get variation in results for a fixed seed.