Closed pharringtonp19 closed 2 years ago
Hi @pharringtonp19 😄
Thank you again for actively probing evosax
and the kind words! In general stable ES hyperparameters can depend on the population size (similar to how the GD learning rate can depend on the batchsize). If in your example one changes the differential weight (diff_w
) from the default of 0.8 to 0.5 you get much more stable dynamics for a population size of 500 (see attached image). For a population size of 10, on the other hand, this will slow down the progress/dampen the exploration of the Differential ES.
I am actively working on setting good default parameters for all strategies which are robust across standard tasks (supervised learning, control, etc.). But tuning parameters across tasks and strategies takes some time :) I guess that tells us something about how much tuned parameters like the standard 3e-04
Adam lrate matter for algorithm adoption. Let me know if you have any recommendations/experience!
Rob
@RobertTLange That's a nice figure. I will spend some time playing with the hyperparameters then. Thanks!
Hi Rob,
Speaking of hyperparameters, I am trying to understand constructor parameter memory_size
in optimizer LM_MA_ES
. I could not find an explanation in the original paper for it. How did you come up with its default value of 10
and what is a good guideline to modify it?
Fantastic work BTW!
Got it! It was referred to as mmax
in the original LM-MA-ES paper and initialized to, unsurprisingly, 4+floor(3*log(n))
.
Congrats on a great library that is super easy to jump into!
To better understand how things work, I am running
Differential_ES
on regression problem (Colab Notebook)One thing that I'm running into in this toy example is that the performance becomes worse when I increase the
popsize
of theDifferential_ES
strategy. Have you ever encountered something like this?Attached are two figures corresponding to a population training size of 10.pdf and 500.pdf