Closed VikasChidananda closed 1 year ago
In my experience, PPO is actually relatively robust to reasonable changes in the metaparameters. @VignonColin could say more / probably remembers these details better, but the "make or break" was really normalization of the (state, action, reward) to O(1), the good choice of the different timescales, and the use of MARL - I am pretty sure that we never did too much tuning of the PPO parameters per se and just kept the defaults from tensorforce.
This has also been my experience with PPO so far. But it would be much insightful to know how to tune these parameters especially net_architecture
.
You mean the architecture of the neural network? Honestly the control tasks we have been looking into so far have been relatively simple from the neural network point of view, and typically a couple of fully connected layers was enough. I had tested long time ago (on some 2D cylinder cases) some more variations around the architecture etc, but this was not making a big difference. I just think that while the control strategies we find are useful / working well, they are actually not that complicated.
A good example of this is https://link.springer.com/article/10.1140/epje/s10189-023-00285-8 (the code is available on GH too): the control law that beats the traditional opposition control by nearly doubling the drag reduction in some cases is actually "just" a "simple" bang-bang controller-ish control law. You cannot easily recover this from linearization of the system and analysis of the perturbation theory dynamics as far as I know, but this is not very "complex" really :) .
This was a good read. And puts things to perspective in terms of control behavior. I had tried with 3 hidden layers, but it didn't converge to an optimal control at all, just two hidden layers did the work. On the other hand, with the same no. of trainable parameters, deeper nets are more expressive than shallower ones [Bengio et.al.] eg. discarding output and input layers for trainable params of 16, 2x4x2
is more expressive than 8x8
. In this case net_arch
is decision that has to be made. But yes, keeping to two layers will do the trick for not so "complex" controls.
Thanks very much!
Hi all,
I have a couple of questions regarding the chosen hyperparameters (i.e., network architecture, PPO hyperparameters, etc) How did you decide on these specific values? (did you run a hyperparameter study / was it taken from Bifarle? / the original PPO paper?)
My understanding is that RL algorithms are extremely sensitive to these hyperparameters and I am curious as to how to decide upon these.
Thank you