IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.32k stars 460 forks source link

[Question] OpenAI Gym Tutorial #383

Open saltypeanuts opened 4 years ago

saltypeanuts commented 4 years ago

I'm trying to port an OpenAI Gym Environment and use coach for the learn on top.

The tutorial currently reads (emphasis mine):

Adding an Environment

Adding your custom environments to Coach will allow you to solve your own tasks using any of the predefined algorithms. There are two ways for adding your own environment to Coach:

    **Implementing your environment as an OpenAI Gym environment**
    Implementing a wrapper for your environment in Coach

In this tutorial, we'll follow the 2nd option, and add the DeepMind Control Suite environment to Coach. We will then create a preset that trains a DDPG agent on one of the levels of the new environment.

Calling dir() on an agent (arbitrarily calling dqn_agent)

dir(dqn_agent)

yields:


['AgentParameters',
 'AlgorithmParameters',
 'DQNAgent',
 'DQNAgentParameters',
 'DQNAlgorithmParameters',
 'DQNNetworkParameters',
 'EGreedyParameters',
 'EnvironmentSteps',
 'ExperienceReplayParameters',
 'FCMiddlewareParameters',
 'InputEmbedderParameters',
 'LinearSchedule',
 'MiddlewareScheme',
 'NetworkParameters',
 'QHeadParameters',
 'Union',
 'ValueOptimizationAgent',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'np']

and calling dir() on these doesn't provide much insight either.

Once I invoke my gym env, what's the appropriate way to call it inside one of the algorithms? In ray this is accomplished with:

class my_env(gym.Env):
    __init__():
    reset():
    _next_observation():
    step():
    _take_action():
    ....

register_env("my_env_name", env_creator)
trainer = PPOTrainer(env = "my_env_name", config = config
trainer.train()
saltypeanuts commented 4 years ago

Looking over the class rl_coach.environments.gym_environment.GymEnvironment( from https://nervanasystems.github.io/coach/components/environments/index.html

It seems incredibly specific for tasks to be run on the Atari system, not custom implementations to address other problems.

Parameters: level – (str) A string representing the gym level to run. This can also be a LevelSelection object. For example, BreakoutDeterministic-v0

Not relevant for all RL environments.

frame_skip – (int) The number of frames to skip between any two actions given by the agent. The action will be repeated for all the skipped frames.

Not relevant for all RL environments.

Then there's this chunk of the tutorial from https://github.com/NervanaSystems/coach/blob/master/tutorials/2.%20Adding%20an%20Environment.ipynb


The following functions cover the API expected from a new environment wrapper:

Is this the API that's being created in the tutorial or Coach's internal API which is implemented by all algorithms?

    _update_state - update the internal state of the wrapper (to be queried by the agent), which consists of:

Great, so the wrapper expects and _update_state command.

However, both the wrapper and the environment are both named the same thing in the tutorial

# Environment
class ControlSuiteEnvironment(Environment):

vs

class ControlSuiteEnvironment(Environment):

I presume the second code block after the demands of the API is the wrapper function. Is it intentional that both the environment and wrapper are named the same thing for the tutorial (and would be required for creating a wrapper for a custom gym environment)?

        self.state - a dictionary containing all the observations from the environment and which follows the state space definition.
        self.reward - a float value containing the reward for the last step of the environment
        self.done - a boolean flag which signals if the environment episode has ended

Self explanatory, fine.

        self.goal - a numpy array representing the goal the environment has set for the last step

Is this to be set in the environment or the wrapper? I don't see it defined or used anywhere in the tutorial. It's not exactly clear how this is different from the reward. Is this set by the training algorithm (agent, critic, etc.)?

        self.info - a dictionary that contains any additional information for the last step
    _take_action - gets the action from the agent, and make a single step on the environment
    _restart_environment_episode - restart the environment on a new episode

Self explanatory, fine. self.info remains undefined both in the environment and wrapper.

    get_rendered_image - get a rendered image of the environment in its current state

Is this required by Coach's API and RL functions or simply because we define it here? In the above paragraph, there are "expected functions" which are neither defined or discussed in detail but are expected from a wrapper. A function of get_rendered_image is irrelevant to some reinforcement learning tasks and seems incredibly odd for Coach to require a wrapper of a rendered image of an environment that doesn't consist of any images.

# Parameters
class ControlSuiteEnvironmentParameters(EnvironmentParameters):

Is this for the environment itself or the wrapper?

galnov commented 4 years ago

The Quick Start Guide notebook was modified recently to better explain this. Please take a look at the "Training an agent with a custom Gym environment" section, and let us know if you still have any question.