[Proposal] Stacks for individual modules to pour into a central stack for creation of a general purpose API

Sharad24 commented 4 years ago

This issue is centered more towards thinking and discussing the long-term plan of GenRL.

First, current real novelties we have:

Base Classes/Structure of Deep RL Agents. (Not completely new, but its probably something indigenous)
Trainers, Loggers
Bandits

Issues (Major):

How can someone use agents (in all modules - classical, deep, bandits, multi) and extend it for their own purpose?
Algorithms not training. Not very flexible.
Env support for only pre-defined types.
The maintenance of algorithms is hard.

I believe, these are very common issues with a python library moving based on implementations. Day after day, new functionality in terms of ABCs, algorithm implementations, support for new envs, etc is added and it creeps out rough ends in this project. We need to think of a more general-purpose RL solver otherwise it becomes too hard to handle everything. So to resolve these issues I propose to create an API through a reduction stack. I'll explain the concept of reduction stacks later-on.

To create a general-purpose RL solver, first, we need to identify the three basic modes of interaction RL usually has:-

Online
Offline
Partially off-policy (Off-policy e.g. DDPG, etc)

Each of these can have different algorithms, which are model-based, model-free, etc. But why focus on interactions that the type of algorithm? Why not make it algorithm centric? When it comes to creating a generic API creation especially something that can be used for a lot of datasets/environments/etc, this interaction specification should ideally come first. Otherwise, we're stuck with what we have right now. The good thing with creating this API in python though, is that optimization does not really happen by pre-specification of modules/functions/etc like in C++, so with python there could be combined focus on ease of implementation that couples very well with a generic solver driven through the reduction stack.

How does a reduction stack work? The term reduction actually comes from the computational theory of complexity. In short words, it means to solve a problem by transforming it into another problem. So in our case, it just means that if you want to solve an RL problem, and we have a great reduction stack developed, the API would go through the stack until it reaches the simplest solver and runs that. So for e.g. in the case of Deep RL Online, it would be PPO --> A2C --> VPG. The selection of the algorithm happens through the command line. So it would be something like genrl --online --agent ppo --env CartPole-v1.

It might look like this can be directly done with a main file that has all the algorithms specifications like stable-baselines does it or the current examples/deep.py. But let me point out whats not good there :-

One central file means that control from individual algorithms is shifted to that central file and not allowing any flexibility.
One central file gets very dirty with heavy usage of if/else statements

How would a reduction stack and this proposal make GenRL?

There will have to be driver classes that sets up the stack and all algorithms. And essentially runs the API. This includes running the trainer, evaluation (if specified), or doing something else.
Each sub-stack / reduction set(deep module) or reduction(some algorithm - like PPO) will ideally have its own set of parsing arguments. But most of these can be inherited from the reduction set. For e.g. PPO will have a lot of its arguments similar to the DeepRL module compared to something like Bandits. So to the driver and stack, the only thing that is exposed is the setup functions.
An algorithm will be instantiated only if its specified in the arguments. This will keep the API very lightweight. We cannot go around instantiating every algorithm.
Since we want to keep everything research-centric, algorithms can still use heavy encapsulation like we plan to do. But building an API means stronger focus on integration of every algorithm with every different environment in every different interaction. For e.g., one could directly call genrl --offline dataset --algo
Plus since every algorithm can have its own set of arguments, it means greater flexibility in the final API. Etc (can list a lot more)

Finally, I think if we can create this General purpose RL solver, it would be much better than what we currently we have.

threewisemonkeys-as commented 4 years ago

Is there an example of a stack using this kind of structure?

Sharad24 commented 4 years ago

Not in python, some in C++

ajaysub110 commented 4 years ago

I'm finding it hard to wrap my head around RL algorithms being reducible in such a way. e.g. Though PPO, A2C, VPG share the broad common attributes of being on-policy and direct policy optimization methods, their implementations vary a great deal in the tiny aspects like action selection, reward clamping, etc. So would there not be a lot of if conditions even in the reduction stack to tell the API where to go? So, is such a framework really implementable in the case of RL where there are a lot of variations in seemingly similar algorithms?

Het-Shah commented 4 years ago

I agree with most of the points here. Here are some of my concerns - 1) Most of the Off Policy Agents don't follow what PPO,A2C,VPG have. 2) This might create problems to implement new RL agents to fit into our stack.

Sharad24 commented 4 years ago

I'm finding it hard to wrap my head around RL algorithms being reducible in such a way.

This is true, no doubt. Our reduction stuck will not be that big, also because we dont have a lot of algorithms. I think our reduction stack will have to look more tree-like if anything. So not so much of a reduction stack. But just to give you an idea, @mehulrastogi is working on evolutionary methods, a lot of which just optimize the way normal algorithms learn. A reduction stack in implementation, especially when building a general-purpose API would be more used to select the reduction than putting them in a hierarchical sense of solving.

e.g. Though PPO, A2C, VPG share the broad common attributes of being on-policy and direct policy optimization methods, their implementations vary a great deal in the tiny aspects like action selection, reward clamping, etc. So would there not be a lot of if conditions even in the reduction stack to tell the API where to go? So, is such a framework really implementable in the case of RL where there are a lot of variations in seemingly similar algorithms?

The framework would not have to take care of that. Ideally, every algorithm will be having some particular functions being called around. Similar to how trainer works. Only some specified arguments will select the reduction.

Sharad24 commented 4 years ago

Most of the Off Policy Agents don't follow what PPO,A2C,VPG have.

Yes, they dont. Which is why Off Policy Agents might need a separate stack. I think it would probably make sense for the central API to just be used to select one of these stacks.

This might create problems to implement new RL agents to fit into our stack.

How? Our current way of getting rid of this problem is by providing classes like trainer, logger and heavy encapsulation so base agents could be extended easily. When creating this stack, these things will be untouched.

Sharad24 commented 4 years ago

Any other thoughts?

github-actions[bot] commented 4 years ago

Stale issue message

SforAiDl / genrl

[Proposal] Stacks for individual modules to pour into a central stack for creation of a general purpose API #186