Closed JaCoderX closed 5 years ago
@JacobHanouna, smart h.parms search is an excellent idea but extremely computationally expensive in DRL case; note a chilling comment in example code you pointed at :) :
Note that this requires a cluster with at least 8 GPUs in order for all trials
to run concurrently,
....
I'll take a closer look to see what can be done here but no earlier than in 3 - 5 days / bit busy developing combined model-based/model-free approach which looks very promising.
@Kismuz actually PBT shouldn't be so computationally expensive. This was part of the objectives of the DeepMind team when they created this search optimization framework.
the idea was to look for balance between random search that is sequential and require many iterations but trivial to select hyperparameters, and the use search of Bayesian optimizer that is heavy on computational cost for selecting the hyperparameters but works parrallel.
PBT use random hyperparameters selection but in a smart way. It compares the performance of the best current models and replacing bad performing ones with new random hyperparameters that are close to the good performing model. A smart random evolutionary optimizer
@JacobHanouna, agree; I already run through the paper and idea is captivating indeed; I'll take a closer study and respond on.
@Kismuz I'm been doing quite a bit of research and experimentation with BTGym lately. You have built a really impressive framework that is very rich and interesting to work with. There is so many possibilities and research direction that are already present in the library to explore and play with.
The system have A lot of moving parts, so I started to look for ways I can boost my experimentation and exploration of different architecture and hyperparameter in an easier way.
DeepMind had proposed a framework to efficiently explore and exploit the hyperparameter space (mentioned in #82 under Population Based Training).
Ray 'Tune' library have this framework already implemented and ready for integration for RL projects. General integration steps are as follows:
For small projects integration is straight forward for BTGym, I tried to examin the code and it is seem to be more complex.
Ideally, we can have a
tune_config
in the launcher that control hyperparameters for the other configs (env_config, policy_config, trainer_config, cluster_config) so we can dynamically control which hyperparameters we want to be fixed and others that we want the system to explore.A section to control the Tune Trial Schedulers parameters:
And finally we need a way for the
launcher.run()
to properly interact withTune.run_experiments(...)
an example from Ray Tune can be found here .
@Kismuz, if it's something you think is worth and possible to implement and we can come up with a good design I can try to implement it