Increasing initial means $\mu$ from the interval [0.5, 0.5] to the interval [-1, 1], which might be a more optimal starting point.
Reducing initial searching variances $\sigma$ from 0.1 to $(3 + \ln N) / (5 \sqrt{N})$, which might make the training smoother, particularly in the beginning.
Reducing the default regularization strength by 10 times, in accordance with the reduced $\sigma$. Now a good regularization parameter is of the order of 0.005.
Now I understand why SNES needs so large regularization previously. It was due to the too large initial searching variance (learning rates for the parameters).
Now I understand why SNES needs so large regularization previously. It was due to the too large initial searching variance (learning rates for the parameters).