RobertTLange / evosax

Evolution Strategies in JAX 🦎
Apache License 2.0
479 stars 44 forks source link

popsize #1

Closed pharringtonp19 closed 2 years ago

pharringtonp19 commented 2 years ago

Congrats on a great library that is super easy to jump into!

To better understand how things work, I am running Differential_ES on regression problem (Colab Notebook)

One thing that I'm running into in this toy example is that the performance becomes worse when I increase the popsize of the Differential_ES strategy. Have you ever encountered something like this?

Attached are two figures corresponding to a population training size of 10.pdf and 500.pdf

RobertTLange commented 2 years ago

Hi @pharringtonp19 😄

Thank you again for actively probing evosax and the kind words! In general stable ES hyperparameters can depend on the population size (similar to how the GD learning rate can depend on the batchsize). If in your example one changes the differential weight (diff_w) from the default of 0.8 to 0.5 you get much more stable dynamics for a population size of 500 (see attached image). For a population size of 10, on the other hand, this will slow down the progress/dampen the exploration of the Differential ES.

I am actively working on setting good default parameters for all strategies which are robust across standard tasks (supervised learning, control, etc.). But tuning parameters across tasks and strategies takes some time :) I guess that tells us something about how much tuned parameters like the standard 3e-04 Adam lrate matter for algorithm adoption. Let me know if you have any recommendations/experience! Rob

Screenshot 2022-03-12 at 09 10 50
pharringtonp19 commented 2 years ago

@RobertTLange That's a nice figure. I will spend some time playing with the hyperparameters then. Thanks!

twoletters commented 1 year ago

Hi Rob,

Speaking of hyperparameters, I am trying to understand constructor parameter memory_size in optimizer LM_MA_ES. I could not find an explanation in the original paper for it. How did you come up with its default value of 10 and what is a good guideline to modify it?

Fantastic work BTW!

twoletters commented 1 year ago

Got it! It was referred to as mmax in the original LM-MA-ES paper and initialized to, unsurprisingly, 4+floor(3*log(n)).