JuliaReinforcementLearning / ReinforcementLearning.jl

A reinforcement learning package for Julia
https://juliareinforcementlearning.org
Other
582 stars 108 forks source link

Improving Collaboration: Separate out the environment interface #954

Open zsunberg opened 1 year ago

zsunberg commented 1 year ago

Hi everyone,

It has been cool to see the recent flurry of contributions to this package, especially by @jeremiahpslewis. In a recent discussion, someone asked what would facilitate cooperation between the POMDPs.jl and JuliaRL communities. I was thinking about this a bit more and came to the conclusion:

Separating out the environment interface would be the most helpful change for expanding collaboration.

There are a few reasons for this:

  1. There are many different reasons for writing RL algorithms. I assign homeworks where students write RL algorithms ranging from tabular SARSA to DQN or policy gradient; someone else might want a single very-high-performance PPO to reliably deploy to a web service; another person might want a library of research-quality algorithms to compare to; another person might want a CleanRL-style set of implementations that maximize readability. These should not all be in the same package, but they should use the same environment interface.
  2. Since this environment interface will have many stakeholders, there must be a way for all of the stakeholders to monitor and weigh in on interface design decisions. Currently, any discussion about the environment interface will be also be mixed in with discussion about GPUs, hooks, etc.
  3. Let's say that I write a package that uses the environment interface in RLCore.jl, but I don't want to use the policy interface. If I say I use RLCore.jl, it is unclear if I am committing to using just the environment interface or also the policy interface, and if a user wants to write an environment, they will find the RL.jl documentation and could be very distracted by all of the information about experiments, agents, etc, which my package does not use.
  4. It would be easier to understand the environment interface if it and its documentation was separated from the RL.jl documentation. (though the current environment interface documentation has improved a lot already!)
  5. In the successful Python RL ecosystem, the environment interface in gym/gymnasium/pettingzoo is separated from the packages that implement learning agents.

If the environment is separated out (and is sufficiently flexible), I would probably convert some important packages like MCTS and POMCP to use it. Then, they could be much more compatible with RL.jl.

A final note: In principle, CommonRLInterface could be a candidate for a separated-out environment interface, but I do not think it can be successful unless RL.jl chooses to use it directly. To be clear, I would vigorously advocate for this, and I am happy to discuss why, but I recognize that this would be biased since I wrote most of that package.

jeremiahpslewis commented 1 year ago

Thanks for kicking off this discussion! What do you mean by use CommonRL directly? I've been wondering for some time whether we should use consistent naming with CommonRL. I haven't gone through every method in CommonRL, so I'm not willing to commit to the exact naming 100%, but I would love to converge on one set of terms & apis. My thought would be that we first do this for the methods which are already included in CommonRL, then in a second step look into env's. Thoughts? @HenriDeh

jeremiahpslewis commented 1 year ago

Concretely, I mean things like https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/abae05c358635a29c9e7c0311e3db5066d5de724/src/ReinforcementLearningBase/src/CommonRLInterface.jl#L51 where I'd be happy to use valid_actions and drop legal_action_space

zsunberg commented 1 year ago

What do you mean by use CommonRL directly?

By this option, I mean completely deprecating and then removing RLCore.AbstractEnv and using CommonRL.AbstractEnv and the methods from CommonRL everywhere within RL.jl. This would be a big change, and I don't understand all the consequences yet.

HenriDeh commented 1 year ago

Thoughts? @HenriDeh

I am so in favor of this. I don't think it would be that overwhelming of a change. Deprecating first, then dropping is a good idea because many algorithms are not tested at the moment.