Open saltypeanuts opened 4 years ago
Looking over the class rl_coach.environments.gym_environment.GymEnvironment( from https://nervanasystems.github.io/coach/components/environments/index.html
It seems incredibly specific for tasks to be run on the Atari system, not custom implementations to address other problems.
Parameters:
level – (str) A string representing the gym level to run. This can also be a LevelSelection object. For example, BreakoutDeterministic-v0
Not relevant for all RL environments.
frame_skip – (int) The number of frames to skip between any two actions given by the agent. The action will be repeated for all the skipped frames.
Not relevant for all RL environments.
Then there's this chunk of the tutorial from https://github.com/NervanaSystems/coach/blob/master/tutorials/2.%20Adding%20an%20Environment.ipynb
The following functions cover the API expected from a new environment wrapper:
Is this the API that's being created in the tutorial or Coach's internal API which is implemented by all algorithms?
_update_state - update the internal state of the wrapper (to be queried by the agent), which consists of:
Great, so the wrapper expects and _update_state command.
However, both the wrapper and the environment are both named the same thing in the tutorial
# Environment
class ControlSuiteEnvironment(Environment):
vs
class ControlSuiteEnvironment(Environment):
I presume the second code block after the demands of the API is the wrapper function. Is it intentional that both the environment and wrapper are named the same thing for the tutorial (and would be required for creating a wrapper for a custom gym environment)?
self.state - a dictionary containing all the observations from the environment and which follows the state space definition.
self.reward - a float value containing the reward for the last step of the environment
self.done - a boolean flag which signals if the environment episode has ended
Self explanatory, fine.
self.goal - a numpy array representing the goal the environment has set for the last step
Is this to be set in the environment or the wrapper? I don't see it defined or used anywhere in the tutorial. It's not exactly clear how this is different from the reward. Is this set by the training algorithm (agent, critic, etc.)?
self.info - a dictionary that contains any additional information for the last step
_take_action - gets the action from the agent, and make a single step on the environment
_restart_environment_episode - restart the environment on a new episode
Self explanatory, fine. self.info remains undefined both in the environment and wrapper.
get_rendered_image - get a rendered image of the environment in its current state
Is this required by Coach's API and RL functions or simply because we define it here? In the above paragraph, there are "expected functions" which are neither defined or discussed in detail but are expected from a wrapper. A function of get_rendered_image
is irrelevant to some reinforcement learning tasks and seems incredibly odd for Coach to require a wrapper of a rendered image of an environment that doesn't consist of any images.
# Parameters
class ControlSuiteEnvironmentParameters(EnvironmentParameters):
Is this for the environment itself or the wrapper?
The Quick Start Guide notebook was modified recently to better explain this. Please take a look at the "Training an agent with a custom Gym environment" section, and let us know if you still have any question.
I'm trying to port an OpenAI Gym Environment and use coach for the learn on top.
The tutorial currently reads (emphasis mine):
Calling dir() on an agent (arbitrarily calling dqn_agent)
dir(dqn_agent)
yields:
and calling dir() on these doesn't provide much insight either.
Once I invoke my gym env, what's the appropriate way to call it inside one of the algorithms? In ray this is accomplished with: