Reinforcement Learning Interface

etotheipluspi commented 7 years ago

There is nothing stopping us from wrapping our POMDP models to work with RL algorithms. This would allow us to include deep reinforcement learning algorithms in our solver suite, and all the good things that come with that.

I think this interface should look like a simplified version of GenerativeModels.jl. I like what has been done in Reinforce.jl. They even mention JuliaPOMDP in one of their issues on what their API should look like. Their API uses an AbstractEnvironment type which implements the following methods:

reset!(env)
actions(env, s) --> A
step!(env, s, a) --> r, s′
finished(env, s′)

and optional overrides:

state(env) --> s
reward(env) --> r

I like this interface. We can include it either in GenerativeModels.jl or in a new package that would live in the JuliaPOMDP package director (maybe RL.jl or something of the sort).

What are everyone's thoughts on the interface itself, and how it should be included into POMDPs.jl?

zsunberg commented 7 years ago

I really think we should work with Reinforce.jl rather than making anything parallel (assuming that their interface makes sense). It would be a huge tragedy if we ended up with something slightly different and things that could be compatible are not. (We may need to register all of our solvers after all if we want to engage with the community better)

I am not sure I completely understand their interface. Is the state part of the environment or not? This seems very important to me.

etotheipluspi commented 7 years ago

The biggest issue in working with Reinforce.jl is that it's a pretty hefty package with a bunch of deps. The package is not just an interface, but also implements algorithms, has support tools, etc. It also has a number of abstract types that would conflict with types in POMDPs.jl like AbstractPolicy for example. Working with Reinforce.jl would require close collaboration between the devs there and someone from POMDPs.jl, and, most likely, code changes to both packages to get them to work together if at all. As you mention, we may need to register our solvers with METADATA (which I think is a pretty big decision on its own, and I'd like to avoid it if possible).

It also seems like Reinforce.jl currently implements only the cross-entropy method, and it's not clear to me when and if deep RL algs will be implemented, which I think should be the main focus of our efforts with RL.

In the Reinfroce.jl interface, the state doesn't have to be a part of the environment. It seems that the state(env) function is optional. I am not entirely sold on their API, but it is more flexible than the OpenAI gym API, and could work with our tree based solvers as well.

cho3 commented 7 years ago

Guys, I can't believe you've completely forgotten about me.

You can look at my two half-baked attempts at making a reinforcement learning package for inspiration:

Exhibit A Exhibit B

IIRC it's not too crazy to implement RL into the POMDPs.jl framework. You just have to add a bunch of annoying parameters relating to simulation into your RL type.

zsunberg commented 7 years ago

@cho3 , thanks! We haven't forgotten!

I don't think we would need to make any changes to POMDPs.jl to use our problems with Reinforce.jl - we would just need to write some lightweight wrappers, and I think those wrappers could conform to Reinforce.jl type heirarchy. I don't think any code would need to be changed in either of the core packages (I may be missing something - let me know if I am).

If we write our own separate package, then the work will be duplicated, and, in the long run, people will have to choose between them or spend a lot of time writing glue code. If we combine forces, even though it's more difficult now, I think it will be a better experience ultimately.

If we do write our own I think we should do it like gym with the state in the environment - IMO it will be too hard to communicate expectations to users if the state can be in both the environment and a separate variable. GenerativeModels covers the case where the environment and state are separated.

Also, I think only one-way compatibility is feasible, specifically, any RL solver should be able to solve a generative model problem or a transition distribution problem, or an RL problem, but MCTS should only be able to solve generative model problems or transition distribution problems, but not problems specified only using the RL interface.

zsunberg commented 7 years ago

OK, ideally what there should be is one big package for all types of sequential decision problems with three tiers of problem specification, full distribution modeling, generative model, and gym-style environments.

zsunberg commented 7 years ago

I think you might be right - Reinforce.jl might be so far from what we are envisioning that we should just create our own package - we may want to register it to METADATA separately from POMDPs.jl though

zsunberg commented 7 years ago

@etotheipluspi, is the goal of this to

Allow us and others to write new RL solvers to be used on POMDPs.jl problems. or
Allow us to use previously written RL solvers (like ones written for gym environments with POMDPs problems) or
Create an RL interface that is simpler and more familiar to people coming from an RL background so that they will implement their fancy solvers in a way that is compatible with POMDPs problems?

If it is 1, I think GenerativeModels should work fine; If it is 2, it makes sense to have something small in the POMDPs.jl ecosystem like GymRLSolver.jl If it is 3, I think we should register another package on METADATA that is a reinforcement learning package (if we don't want to work with Reinforce.jl).

zsunberg commented 7 years ago

Or is the goal to do in-place updates on the environment rather than creating a new state every time?

etotheipluspi commented 7 years ago

@zsunberg I think we should aim for something along the lines of 1 and 2. That's how we aim

The problem with only using GenerativeModels.jl for 1 is that it is so different from typical RL interface (i.e. gym) that it might be confusing.

I would be happy if we added more functions to GenerativeModels (like step, etc) that make it easier to define RL environments.

zsunberg commented 7 years ago

Ok, cool. When you say "it might be confusing", do you mean for a solver writer or a problem writer or someone else? Or do you just think it will not be adopted in general because it is different from the mainstream?

Can you outline the differences that seem important to you? Is it just a vocabulary problem (i.e. step verses generate_)?, or does it also have to with the evironment/state distinction? or the rng? or does it have to do with the presence of functions like reset!?

Would it be possible to code up a simple example solver to illustrate why the current GenerativeModels interface would be confusing? (Or a problem if that is what you think is confusing).

Sorry for being pedantic about this - it is just really hard to get right and I want to get it right. Introducing more functions could potentially make universal usability more of a challenge. Btw I am completely OK with revising GenerativeModels or completely nuking it if we come up with something better.

zsunberg commented 7 years ago

We could have a "Where is the step function?" FAQ that points to generative models and explains their relationship.

zsunberg commented 7 years ago

We should concretely define the use case. Is this a package that allows POMDPs solvers to be used with more problems (e.g. environments that are coded to include the state in them), or one that allows POMDPs problems to be solved by more solvers (yes the latter is definitely true), or both?

etotheipluspi commented 7 years ago

When you say "it might be confusing", do you mean for a solver writer or a problem writer or someone else? Or do you just think it will not be adopted in general because it is different from the mainstream?

It will be easier from the problem writer's standpoint, in particular those who are used to the interfaces like in gym.

Can you outline the differences that seem important to you? Does it just have to do with the function names (i.e. step verses generate_)?, or does it also have to with the evironment/state distinction? or the rng? or does it have to do with the presence of functions like reset!?

I think the simplest way to look at this is that gym is now the de facto benchmark suite in RL. Not following their interface has a number of potential drawbacks like turning potential users away, and making it more difficult to wrap problems and solvers that do rely on this interface.

Would it be possible to code up a simple example solver to illustrate why the current GenerativeModels interface would be confusing? (Or a problem if that is what you think is confusing).

This depends entirely on the user. One thing that could trip users up is that there are six generate functions.

Sorry for being pedantic about this - it is just really hard to get right and I want to get it right. Introducing more functions could potentially make universal usability more of a challenge. Btw I am completely OK with revising GenerativeModels or completely changing it if we come up with something better.

I don't think adding the step interface will make usability a challenge. Do you mean this in the context of the interfaces that a problem writer might need to choose? Currently there are two - the full blown POMDPs.jl interface and the GenerativeModels interface, the step interface would be a third one. I think the step interface can be advertised as something for people with RL interests only.

also @cho3 we definitely haven't forgotten. Your implementations looks really useful. I think they'll serve as a great starting point. Looks like we should be able to

zsunberg commented 7 years ago

It will be easier from the problem writer's standpoint, in particular those who are used to the interfaces like in gym.

Ok great, so this is mostly about fitting more problems into our solvers like POMCP. It seems like, if the user implements the gym interface, plus an extra state function that returns the state, and a state! function that sets the state, I think POMCP, for example, should be usable.

Another thing to note is that, in my experience, when students understood the GenerativeModels interface, they reacted with "Oh, that's exactly what I need!", so maybe we just need to focus on getting people to understand and use that interface.

I think the simplest way to look at this is that gym is now the de facto benchmark suite in RL. Not following their interface has a number of potential drawbacks like turning potential users away, and making it more difficult to wrap problems and solvers that do rely on this interface.

Yeah, definitely! Can you outline which distinctive things about gym you think we need to keep, and which ones we can/should change slightly? Is the lack of environment/state distinction important? Is using the exact same vocabulary important?

One thing that could trip users up is that there are six generate functions.

What do you think about JuliaPOMDP/GenerativeModels.jl#11? Would that be easier to understand or harder? (this question is sort of orthogonal to this current issue)

I don't think adding the step interface will make usability a challenge. Do you mean this in the context of the interfaces that a problem writer might need to choose?

I mean that if we introduce something parallel to generative models, then it might be difficult to make solvers (e.g. MCTS/POMCP) work on both interfaces, and if some solvers work on some problems and not others, the ecosystem will be frustrating to use.

zsunberg commented 7 years ago

I guess let's write it up and see how it goes. I think we just need to answer the question of whether the state is part of the environment first (i think it should be).

zsunberg commented 7 years ago

The more I think about this, the more I like it. Are you planning on working on it, @etotheipluspi?

etotheipluspi commented 7 years ago

Yep, putting something together.

zsunberg commented 6 years ago

This is implemented in https://github.com/sisl/DeepRL.jl

JuliaPOMDP / POMDPs.jl

Reinforcement Learning Interface #126