Open jcallaham opened 2 years ago
Updating with some of the more recent discussion (also see #37):
The biggest thing we need to sort out is some kind of scalable infrastructure so that RL agents can be trained in a distributed context on HPC resources. It seems like Ray+RLlib might be a good combination for this, especially in combination with support for Kubernetes and SLURM.
As part of this process, we will probably have to flip from using the Docker container to hold the entire hydrogym package to having a more stripped down standard package configuration that will then spin out Docker containers (each of which can have many MPI processes) that can run the simulations and communicate with the host/agent. Although I think continuing to support a self-contained image would also be useful, particularly for anyone trying to do any classical control.
I'm thinking the roadmap for developing this might look like the following:
Fixed some minor compatibility/interface issues and set up an example of training with RLlib (currently running on UW workstation... I expect this will take a while.
So far this is 100% serial - the CFD simulation and RL agent are both on a single CPU, and only one environment is active at a time. The immediate next steps will be to add a GPU resource to the Docker container and figure out how to do MPI-parallelization within the environment using a collection of Ray actors.
More info:
I am currently bogged down by finishing 2 papers, but I would be happy to write the logic for the parallel execution as I have done something similar before in ~2 weeks?
For the interface/API-design we can probably also take some notes from one of the more recent RL-environment releases named Gymnax. The API is really clean, and should afford us all the syntactical freedom that we will need to define the control problems etc.
Also jumped across my radar this week, @eigensteve if we want to make this even more of a community-thing post initial release it would probably make sense to prepare something akin to the OpenFold consortium, i.e. AlphaFold in the open with big support from AWS. Would this go into the direction you initially envisioned?
Considerations from my side for the initial release:
Just came across this here:
https://github.com/sail-sg/envpool
@jcallaham do we want to give it a try? Looks fairly applicable to me :)
Definitely! Envpool looks great. Do you think that could be better/easier than Ray Cluster? Or we could always try them both out.
Gymnax looks very nice as well. I also haven't had much time for this project lately, but hoping to get back to it a bit soon. Let me know if you do anything with the parallelization, but I might take a crack at reconfiguring so that the environments spin out their own Docker containers, as you suggested over here:
@jcallaham maybe we should consider to disentangle the two levels of computation from each other sooner rather than later to more easily advance development.
I.e. the gym environment and its API as a separate package, which then launches Firedrake application containers when prompted to do so. Would be easy to test on a single machine, and we could then slowly look towards scaling it to proper reinforcement learning training sizes.
Also I guess I should probably prioritize finishing validating all the basic test cases. The leaderboard is a great idea - I think Steve may have set up something similar for their dynamics challenge set, though I'm not sure if that ever went live. I'd be good with starting with RLlib for benchmarks, especially for a first release, though ideally we could add in linear optimal control as well (at least as an illustration for the cylinder).
Sorry for the silence - slowly seeing the light at the end of the tunnel for the two papers. Shall we do a general meeting e.g. tomorrow, or next week to really look at work-packages and to get to a first alpha version which can do PPO on a large-ish server?
I am not sure about Envpool vs Ray Cluster, but my impression is that there is much more development momentum behind Ray and might hence be the much more future-proof choice. Besides also having a much much better documentation.
My impression is what we would then want for the initial release is:
I think our goal should be to be at the "training board"-benchmarking stage by the end of August to then eye a submission in ~October?
No worries! I also haven't been doing much of the lifting. But I did get PPO running with RLLib in serial on a workstation... not sure how strong the results are yet but at least it ran. That only required a couple of minor tweaks to the existing code.
The benchmarking and timeline sounds right to me. I'll email everyone today or this weekend to try to find some time to meet soon.
Linking the WIP pull-request for the distributed backend here. #41
/packaging
to automatically initialize a node in the Ray-cluster when neither Slurm, nor Kubernetes are used.
What do we think would be the key features to have in place before releasing to the public/writing an initial paper?
Some things that have already been brought up: