IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.33k stars 461 forks source link

Coach as a library #212

Closed nzmora closed 5 years ago

nzmora commented 5 years ago

Take a sip of your coffee and sit back: this is going to be long :-(

Background: Recently I've created a sample Distiller application which uses RL agents (DDPG; Clipped PPO) to automate DNN compression (WiP). In this use-case, a compression application creates a DistillerWrapperEnvironment, a Coach graph_manager, and a Coach agent_params from a preset file that is stored in the Distiller repo.

The Automated Deep Compression (ADC) application populates the graph manager with the environment details, creates a graph and executes graph_manager.improve().

In this use-case, Coach is used as a library, serving an application. As far as I know, this is the first time Coach is used as a library and not an application. The Coach-as-a-library use case works well but Coach is lacking a couple of features that would make it a bit more user-friendlyl. Not all of these are equally important:

  1. Control over parallelization
  2. Access to signal objects.
  3. Simplification of the exported name-space. I pasted below some of the imports I need to perform in ADC. I'd prefer a flat, or near-flat namespace. Numpy is a good example. Matplotlib has a wider namespace, but it is not as deep as Coach. For example, the symbols in rl_coach.agents.ddpg_agent can be exported directly from rl_coach.agents (reducing the import depth from 3 to 2).
    
    from rl_coach.agents.ddpg_agent import DDPGAgentParameters
    from rl_coach.graph_managers.basic_rl_graph_manager import BasicRLGraphManager
    from rl_coach.graph_managers.graph_manager import ScheduleParameters
    from rl_coach.base_parameters import VisualizationParameters
    from rl_coach.core_types import EnvironmentEpisodes, EnvironmentSteps
    from rl_coach.environments.gym_environment import GymVectorEnvironment
    from rl_coach.exploration_policies.truncated_normal import TruncatedNormalParameters
    from rl_coach.exploration_policies.additive_noise import AdditiveNoiseParameters

from rl_coach.memories.memory import MemoryGranularity from rl_coach.base_parameters import EmbedderScheme from rl_coach.architectures.tensorflow_components.layers import Dense

4. A way to pass a function that stops the learning process.  Ideally I’d also like to be able to call graph_manager.terminate() from my app.  I might employ complex logic to decide whether to stop the learning process.
5. [This may already exist] A way to save the agent and its state, and to later load the agent and run it only in “playing” mode. 
6. The way Coach works creates the environment is a bit convoluted: (1) the path to the environment class is set in the preset; (2) Coach reads the preset and instantiates the environment.

env_params = GymVectorEnvironment() env_params.level = '../automated_deep_compression/ADC.py:DistillerWrapperEnvironment'

Why change this? 
1.  This is confusing for casual users; and tedious.
2.  I have my own environment.  The constructor of this environment receives a whole bunch of arguments.  Because Coach instantiates the environment, I have to package these parameters inside a ``` graph_manager.env_params.additional_simulator_parameters``` (i.e. pass them to Coach) so that Coach will then pass them back as arguments to my environment’s constructor.  We can live with this, but I think we can improve on this by letting the app instantiate the Environment and then pass it to Coach.
    # These parameters are passed to the Distiller environment
    graph_manager.env_params.additional_simulator_parameters = {'model': model,
                                                                'app_args': app_args,
                                                                'amc_cfg': amc_cfg,
                                                                'services': services}

Thanks!
Neta
jamescasbon commented 5 years ago

Lots of good stuff in here!

  1. Control over parallelization

Yes, please. I really want to be able to do graph_manager.improve(num_workers=4) or something similar. I can't really work out how to do this at the moment. The advice seems to be to subclass CoachLauncher?

  1. Simplification of the exported name-space.

Yes, nice to have. PEP8 says "packages should also have short, all-lowercase names, although the use of underscores is discouraged".

from rl_coach.agents.ddpg_agent import DDPGAgentParameters

could be

from rl_coach.agents
agents.ddpg.Parameters

A short term fix could be to add an api module which has saner structure to allow you to change the library structure, (eg import rl_coach.agents.api as agents).

nitsanluke commented 5 years ago

@shadiendrawis How can I get access to the agent object from the graph_manager I only have access to the Agent Parameters? It would be good to have a way to access the agent object. This would enable to do many things after training when the model is saved and re-loaded. Eg: to save the memory buffer when training ends, re-initialize/load memory, etc....

shadiendrawis commented 5 years ago

PR #348 addresses the points raised in this issue.

@nitsanluke if you are using BasicRLGraphManager you can use the get_agent method to get access to the agent object, in the more general case where there could be several hierarchal levels and several agents you'll need to access a specific agent at a specific level by using graph_manger.level_managers[LM_index].agents['agent_name'].