JuliaReinforcementLearning / ReinforcementLearning.jl

A reinforcement learning package for Julia
https://juliareinforcementlearning.org
Other
584 stars 110 forks source link

Question: Can ReinforcementLearning.jl handle Partially Observed Markov Processes (POMDPs)? #608

Closed 00krishna closed 1 year ago

00krishna commented 2 years ago

Hello, I was just wondering if the ReinforcementLearning.jl package can handle Partially Observed MDPs? I know that the POMDP.jl package can work with these, but the interface is very different. Are there any plans to connect both RL ecosystems in Julia?

zsunberg commented 2 years ago

This is my attempt at such a connection: https://github.com/JuliaReinforcementLearning/CommonRLInterface.jl in terms of the environment interface.

Here is a simple POMDP example in the ReinforcementLearning.jl interface: https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/e90240c587ba22c68c01ae78b743f29a447b8314/src/ReinforcementLearningEnvironments/src/environments/examples/TigerProblemEnv.jl#L7-L13 Instead of returning a true Markov state, the RLBase.state function just returns a noisy observation.

The question of which algorithms are effective at solving POMDPs is more nuanced. In general, the optimal policy for a POMDP may depend on the entire history of observations rather than just the current observation as in an MDP. (I'll let others comment on what the preferred approach for this is in ReinforcementLearning.jl)

findmyway commented 2 years ago

Are there any plans to connect both RL ecosystems in Julia?

Yes!

But it might not be that quick. I'll come up with a roadmap in the next week to sort out what we want to achieve in the following minor version. One of the main goals is to improve the integration with other ecosystems in both Julia and Python.

zsunberg commented 2 years ago

I'll come up with a roadmap in the next week to sort out what we want to achieve in the following minor version.

@findmyway do you plan to build this around CommonRLInterface or a different approach?

findmyway commented 2 years ago

I'll come up with a roadmap in the next week to sort out what we want to achieve in the following minor version.

@findmyway do you plan to build this around CommonRLInterface or a different approach?

It will still be around CommonRLInterface. The main change will happen in the RLCore. I plan to decouple the experience generation and algo parts, so that we may have more indepth integrations.

00krishna commented 2 years ago

Hey folks, This is a very interesting conversation. And I think it is really quite helpful. I suppose there are a lot of approaches to solve POMDPs by trying to turn them into MDPs--such as averaging a few frames of video to get the velocity, etc. There are also some simple tricks like say using RNNs in the model. But then there are more interesting algorithms for problems where these standard tricks don't work.

I was wondering what you had in mind for the probabilistic programming language for the state estimation side? I think that the POMDP.jl package and such have their own PPL for estimating some of the Bayesian networks to do state estimation. But there are a much larger set of tools in julia for the probabilistic programming language. I was just wondering what your thoughts are on this question.

One key consideration is to allow some flexibility in backends. There are PPLs like Turing.jl, SOSS.jl, Gen.jl, and even newer ReactiveMP.jl, etc. But each of these methods works better on some problems rather than others. So in the case of doing research, we might find that using one backend versus a different PPL backend will suddenly give a big speedup in performance compared to existing backends. So this could be a very cheap and easy way to suddenly beat the state of the art in performance. ReactiveMP.jl is an interesting case in point, because it does message passing, but it also makes assumptions about the distributions of the variables--thus exploiting conjugacy. Since conjugacy calculations are deterministic, you don't need to use optimization or sampling to get the estimates--hence very fast. I am pretty sure that they is not an implementation of this in Python, hence a nice chance to pick up some very good benchmarks.

These are just some thoughts. But let me know if you had any thoughts about the Probabilistic Programming Language you were thinking of, for the state estimation. Thanks.

00krishna commented 2 years ago

Hey @findmyway , just wondering if you had a chance to think more about the probabilistic programming language choice for POMDPs, as mentioned above.

findmyway commented 2 years ago

Hey @findmyway , just wondering if you had a chance to think more about the probabilistic programming language choice for POMDPs, as mentioned above.

Hey @00krishna , sorry for my late reply. The last time I use PPL (Turing.jl to be more specific) in Julia was two years ago. So I may need some time to refresh my knowledges. (I just know ReactiveMP.jl for the first time😅)

00krishna commented 2 years ago

@findmyway hey, no problem. I totally understand. It is hard to keep up with a lot of the new technologies :). I posted this question a week or so ago, so you may find it helpful to understand the different PPLs out there. They all do similar things, but it probably makes sense to see which one is the best for ReinforcementLearning.jl . We probably just need to understand the POMDP solver algorithms out there and see which PPLs fit with those algorithms. Or we should allow for different PPL backends, and the user can choose. I believe there was an AbstractPPL.jl project that was supposed to help with this. I am most familiar with Turing.jl as well, which is great. But Turing is mostly a sampler based method which might be very slow for RL applications--though Turing does have some Variational Inference tools that could work with RL. Anyhow, post if you have any more thoughts.

findmyway commented 2 years ago

Thanks for the link. That's very helpful. I think I may start with Gen.jl and ReactiveMP.jl first. I'll definitely let you know what I find here.

00krishna commented 2 years ago

Sounds good @findmyway . Definitely message Chad Scherrer--he is the developer for Soss.jl and Tilde.jl, you can evel Slack him. He has a lot of experience with these systems and can tell you about any hidden challenges that might not be obvious. I am more familiar with the algorithms for regular MDPs, and not POMDPs. But I will start to learn that material as well. Then I can help you to build out these tools.