dynamicslab / hydrogym

An RL-Gym for Challenge Problems in Data-Driven Modeling and Control of Fluid Dynamics.
https://hydrogym.readthedocs.io
MIT License
48 stars 10 forks source link

First release planning #36

Open jcallaham opened 2 years ago

jcallaham commented 2 years ago

What do we think would be the key features to have in place before releasing to the public/writing an initial paper?

Some things that have already been brought up:

jcallaham commented 2 years ago

Updating with some of the more recent discussion (also see #37):

The biggest thing we need to sort out is some kind of scalable infrastructure so that RL agents can be trained in a distributed context on HPC resources. It seems like Ray+RLlib might be a good combination for this, especially in combination with support for Kubernetes and SLURM.

As part of this process, we will probably have to flip from using the Docker container to hold the entire hydrogym package to having a more stripped down standard package configuration that will then spin out Docker containers (each of which can have many MPI processes) that can run the simulations and communicate with the host/agent. Although I think continuing to support a self-contained image would also be useful, particularly for anyone trying to do any classical control.

I'm thinking the roadmap for developing this might look like the following:

  1. Add Ray+RLlib to the current image and test PPO training on the cylinder environment with the current architecture (just as proof-of-concept, no need to fully tune and optimize).
  2. Prototype the distributed architecture with an inexpensive environment (e.g. CartPole)
  3. Combine and test new architecture on cylinder environment (eventually with tuning, etc as a benchmark for publication)
jcallaham commented 2 years ago

Fixed some minor compatibility/interface issues and set up an example of training with RLlib (currently running on UW workstation... I expect this will take a while.

So far this is 100% serial - the CFD simulation and RL agent are both on a single CPU, and only one environment is active at a time. The immediate next steps will be to add a GPU resource to the Docker container and figure out how to do MPI-parallelization within the environment using a collection of Ray actors.

More info:

ludgerpaehler commented 2 years ago

I am currently bogged down by finishing 2 papers, but I would be happy to write the logic for the parallel execution as I have done something similar before in ~2 weeks?

For the interface/API-design we can probably also take some notes from one of the more recent RL-environment releases named Gymnax. The API is really clean, and should afford us all the syntactical freedom that we will need to define the control problems etc.

Also jumped across my radar this week, @eigensteve if we want to make this even more of a community-thing post initial release it would probably make sense to prepare something akin to the OpenFold consortium, i.e. AlphaFold in the open with big support from AWS. Would this go into the direction you initially envisioned?

ludgerpaehler commented 2 years ago

Considerations from my side for the initial release:

ludgerpaehler commented 2 years ago

Just came across this here:

https://github.com/sail-sg/envpool

@jcallaham do we want to give it a try? Looks fairly applicable to me :)

jcallaham commented 2 years ago

Definitely! Envpool looks great. Do you think that could be better/easier than Ray Cluster? Or we could always try them both out.

Gymnax looks very nice as well. I also haven't had much time for this project lately, but hoping to get back to it a bit soon. Let me know if you do anything with the parallelization, but I might take a crack at reconfiguring so that the environments spin out their own Docker containers, as you suggested over here:

@jcallaham maybe we should consider to disentangle the two levels of computation from each other sooner rather than later to more easily advance development.

I.e. the gym environment and its API as a separate package, which then launches Firedrake application containers when prompted to do so. Would be easy to test on a single machine, and we could then slowly look towards scaling it to proper reinforcement learning training sizes.

Also I guess I should probably prioritize finishing validating all the basic test cases. The leaderboard is a great idea - I think Steve may have set up something similar for their dynamics challenge set, though I'm not sure if that ever went live. I'd be good with starting with RLlib for benchmarks, especially for a first release, though ideally we could add in linear optimal control as well (at least as an illustration for the cylinder).

ludgerpaehler commented 2 years ago

Sorry for the silence - slowly seeing the light at the end of the tunnel for the two papers. Shall we do a general meeting e.g. tomorrow, or next week to really look at work-packages and to get to a first alpha version which can do PPO on a large-ish server?

I am not sure about Envpool vs Ray Cluster, but my impression is that there is much more development momentum behind Ray and might hence be the much more future-proof choice. Besides also having a much much better documentation.

My impression is what we would then want for the initial release is:

I think our goal should be to be at the "training board"-benchmarking stage by the end of August to then eye a submission in ~October?

jcallaham commented 2 years ago

No worries! I also haven't been doing much of the lifting. But I did get PPO running with RLLib in serial on a workstation... not sure how strong the results are yet but at least it ran. That only required a couple of minor tweaks to the existing code.

The benchmarking and timeline sounds right to me. I'll email everyone today or this weekend to try to find some time to meet soon.

ludgerpaehler commented 2 years ago

Linking the WIP pull-request for the distributed backend here. #41

ludgerpaehler commented 2 years ago