Closed Sharad24 closed 4 years ago
Is there an example of a stack using this kind of structure?
Not in python, some in C++
I'm finding it hard to wrap my head around RL algorithms being reducible in such a way. e.g. Though PPO, A2C, VPG share the broad common attributes of being on-policy and direct policy optimization methods, their implementations vary a great deal in the tiny aspects like action selection, reward clamping, etc. So would there not be a lot of if conditions even in the reduction stack to tell the API where to go? So, is such a framework really implementable in the case of RL where there are a lot of variations in seemingly similar algorithms?
I agree with most of the points here. Here are some of my concerns - 1) Most of the Off Policy Agents don't follow what PPO,A2C,VPG have. 2) This might create problems to implement new RL agents to fit into our stack.
I'm finding it hard to wrap my head around RL algorithms being reducible in such a way.
This is true, no doubt. Our reduction stuck will not be that big, also because we dont have a lot of algorithms. I think our reduction stack will have to look more tree-like if anything. So not so much of a reduction stack. But just to give you an idea, @mehulrastogi is working on evolutionary methods, a lot of which just optimize the way normal algorithms learn. A reduction stack in implementation, especially when building a general-purpose API would be more used to select the reduction than putting them in a hierarchical sense of solving.
e.g. Though PPO, A2C, VPG share the broad common attributes of being on-policy and direct policy optimization methods, their implementations vary a great deal in the tiny aspects like action selection, reward clamping, etc. So would there not be a lot of if conditions even in the reduction stack to tell the API where to go? So, is such a framework really implementable in the case of RL where there are a lot of variations in seemingly similar algorithms?
The framework would not have to take care of that. Ideally, every algorithm will be having some particular functions being called around. Similar to how trainer works. Only some specified arguments will select the reduction.
- Most of the Off Policy Agents don't follow what PPO,A2C,VPG have.
Yes, they dont. Which is why Off Policy Agents might need a separate stack. I think it would probably make sense for the central API to just be used to select one of these stacks.
- This might create problems to implement new RL agents to fit into our stack.
How? Our current way of getting rid of this problem is by providing classes like trainer, logger and heavy encapsulation so base agents could be extended easily. When creating this stack, these things will be untouched.
Any other thoughts?
Stale issue message
This issue is centered more towards thinking and discussing the long-term plan of GenRL.
First, current real novelties we have:
Issues (Major):
I believe, these are very common issues with a python library moving based on implementations. Day after day, new functionality in terms of ABCs, algorithm implementations, support for new envs, etc is added and it creeps out rough ends in this project. We need to think of a more general-purpose RL solver otherwise it becomes too hard to handle everything. So to resolve these issues I propose to create an API through a reduction stack. I'll explain the concept of reduction stacks later-on.
To create a general-purpose RL solver, first, we need to identify the three basic modes of interaction RL usually has:-
Each of these can have different algorithms, which are model-based, model-free, etc. But why focus on interactions that the type of algorithm? Why not make it algorithm centric? When it comes to creating a generic API creation especially something that can be used for a lot of datasets/environments/etc, this interaction specification should ideally come first. Otherwise, we're stuck with what we have right now. The good thing with creating this API in python though, is that optimization does not really happen by pre-specification of modules/functions/etc like in C++, so with python there could be combined focus on ease of implementation that couples very well with a generic solver driven through the reduction stack.
How does a reduction stack work? The term reduction actually comes from the computational theory of complexity. In short words, it means to solve a problem by transforming it into another problem. So in our case, it just means that if you want to solve an RL problem, and we have a great reduction stack developed, the API would go through the stack until it reaches the simplest solver and runs that. So for e.g. in the case of Deep RL Online, it would be PPO --> A2C --> VPG. The selection of the algorithm happens through the command line. So it would be something like
genrl --online --agent ppo --env CartPole-v1
.It might look like this can be directly done with a main file that has all the algorithms specifications like
stable-baselines
does it or the currentexamples/deep.py
. But let me point out whats not good there :-How would a reduction stack and this proposal make GenRL?
genrl --offline dataset --algo
Finally, I think if we can create this General purpose RL solver, it would be much better than what we currently we have.