IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.32k stars 460 forks source link

Enable non-Atari environments for Gym, other custom environments #384

Open saltypeanuts opened 4 years ago

saltypeanuts commented 4 years ago

Following the tutorial (below, first full tutorial on google searching "custom OpenAI Gym environment tutorial") to create a custom OpenAI Gym environment. The environment works (can be stepped through, reset) and interacted with with other RL libraries (ray[rllib]) and manual RL functions (created through python / sklearn, relatively slow).

Environment setup: https://towardsdatascience.com/creating-a-custom-openai-gym-environment-for-stock-trading-be532be3910e

Does not use atari. Does not require a level select parameter. Does not require rendering.

Registering the custom gym environment with Coach:

my_visualization_parameters = VisualizationParameters(render = True, dump_csv = False, dump_signals_to_csv_every_x_episodes = 0)
my_env = gymenv(env = my_env)

errors out giving:

"__init__() missing 3 required positional arguments: 'level', 'frame_skip', and 'visualization_parameters'

None of these are necessary for the custom OpenAI gym environment we've successfully implemented and are trying to interact with with the Coach RL library.

__init__() missing 3 required positional arguments: 'level', 'frame_skip', and 'visualization_parameters'

Going through adding in 1 parameter at a time:

my_env = gymenv(env = my_env, visualization_parameters = my_visualization_parameters)

It works and allow us to skip a lot of the visual output. Totally fine to require if this is needed for Coach's rendering of algorithm progress and not to actually interact with the OpenAI Gym environment.

Still errors giving us:

__init__() missing 2 required positional arguments: 'level' and 'frame_skip'

Next adding in our frame_skip parameter:

my_env = gymenv(env = my_env, visualization_parameters = my_visualization_parameters, frame_skip = 0)

Because we're not rendering anything in Atari, it's kind of irrelevant what we put in for this parameter. It's fine if we can pass in 0 and it just does nothing as a temporary workaround to get the algorithm training. As expected, it allows us to go through with just 1 positional error:

TypeError: __init__() missing 2 required positional arguments: 'level' and 'frame_skip'

Where a workaround is not possible is under 'level'. We do not need anything from the Atari environment loaded (or any game environment for that matter).

First, passing in null:

my_env = gymenv(env = my_env, visualization_parameters = my_visualization_parameters, frame_skip = 0, level = null)

gives us the error:

Error: Attempted to look up malformed environment ID: b"<module 'null' from '/home/my_user/.local/lib/python3.5/site-packages/null.py'>". (Currently all IDs must be of the form ^(?:[\w:-]+\/)?([\w:.-]+)-v(\d+)$.)

Clearly doesn't like having a null put in. Let's try putting level = 0:

my_env = gymenv(env = my_env, visualization_parameters = my_visualization_parameters, frame_skip = 0, level = 0)

It clearly still does not like this (and the same result is given passing 0 as a string, or other integers, or None).

Error: Attempted to look up malformed environment ID: b'0'. (Currently all IDs must be of the form ^(?:[\w:-]+\/)?([\w:.-]+)-v(\d+)$.)

Let's try passing in a level from the tutorial - 'level='BreakoutDeterministic-v4'' from https://github.com/NervanaSystems/coach/blob/master/tutorials/1.%20Implementing%20an%20Algorithm.ipynb

my_env = gymenv(env = my_env, level = 'BreakoutDeterministic-v4', visualization_parameters = my_visualization_parameters, frame_skip = 0)

Nope, there is a an error:

error: No available video device

There doesn't appear to be a work around for not selecting an Atari level when loading in a custom OpenAI gym environment which doesn't explicitly have and use the Atari setting. I think it would greatly benefit the Coach package to allow custom environments from all packages and not require the use of Atari.

galnov commented 4 years ago

An example of training an agent with a custom Gym environment can be found on the “Using GraphManager Directly” section of the Quick Start Guide Another example can be found in this issue

Let us know if, after looking at these examples, it still doesn't work for you.

saltypeanuts commented 4 years ago

Working through it right now.

So far:

for it in range(10):
    coach.graph_manager.log_signal('iteration', it)
    coach.graph_manager.train_and_act(EnvironmentSteps(100))
    training_reward = coach.graph_manager.get_signal_value('Training Reward')
module 'rl_coach' has no attribute 'graph_manager'

Should be graph_managers

module 'rl_coach.graph_managers' has no attribute 'log_signal'
module 'rl_coach.graph_managers' has no attribute 'log_signal'

should be:

rl_coach.graph_managers.graph_manager.Logger

I think?

    rl_coach.graph_managers.graph_manager.Logger('iteration', it)
TypeError: __init__() takes from 1 to 2 positional arguments but 3 were given

Not really sure if the above is correct, but using the below:

rl_coach.graph_managers.graph_manager.setup_logger()

errors out with

TypeError: setup_logger() missing 1 required positional argument: 'self'

Passing through 'iteration' or it errors out.

    rl_coach.graph_managers.graph_manager.train_and_act(EnvironmentSteps(100))

simply doesn't exist. I assume this is the correct call:

 rl_coach.graph_managers.graph_manager.GraphManager.train_and_act()
TypeError: train_and_act() missing 2 required positional arguments: 'self' and 'steps'

Give me a bit, instantializing my_graph_manager as a graph_managers.graph_manager.GraphManager object and going from there.

saltypeanuts commented 4 years ago

I cannot figure out how this is supposed to work. Can you please update the documentation to be consistent with the latest release?

galnov commented 4 years ago

The documentation is up to date, and the tutorial is working fine on my local setup, so let's try this as a sanity check: Can you please try running the latest tutorial on master from a virtualenv following the installation instructions and see if it works ok for you?

saltypeanuts commented 4 years ago

I have an installation error on the dashboard step. I'm running Ubuntu which is probably why.

# Dashboard
sudo -E apt-get install dpkg-dev build-essential python3.5-dev libjpeg-dev  libtiff-dev libsdl1.2-dev libnotify-dev 
freeglut3 freeglut3-dev libsm-dev libgtk2.0-dev libgtk-3-dev libwebkitgtk-dev libgtk-3-dev libwebkitgtk-3.0-dev
libgstreamer-plugins-base1.0-dev -y

E: Unable to locate package libnot
E: Unable to locate package libwebkitgtk-3.0-libgstreamer-plugins-base1.0-dev
E: Couldn't find any package by glob 'libwebkitgtk-3.0-libgstreamer-plugins-base1.0-dev'
E: Couldn't find any package by regex 'libwebkitgtk-3.0-libgstreamer-plugins-base1.0-dev'

Code:

#Intel coach RL
import os
import sys
from typing import Union
from rl_coach.agents.clipped_ppo_agent import ClippedPPOAgentParameters
from rl_coach.environments.gym_environment import GymVectorEnvironment
from rl_coach.graph_managers.basic_rl_graph_manager import BasicRLGraphManager
from rl_coach.graph_managers.graph_manager import SimpleSchedule
from rl_coach.architectures.embedder_parameters import InputEmbedderParameters

module_path = os.path.abspath(os.path.join('..'))
resources_path = os.path.abspath(os.path.join('Resources'))
if module_path not in sys.path:
    sys.path.append(module_path)
if resources_path not in sys.path:
    sys.path.append(resources_path)

env_config = {'df': my_df}

env_params = GymVectorEnvironment(level = '/home/my_user/rl.ipynb:my_env')
env_params.additional_simulator_parameters = {'env_config': env_config}

agent_params = ClippedPPOAgentParameters()
agent_params.network_wrappers['main'].input_embedders_parameters = {'state': InputEmbedderParameters(scheme = []),
                                                                   'desired_goal': InputEmbedderParameters(scheme = [])
                                                                   }

graph_manager = BasicRLGraphManager(
    agent_params=agent_params,
    env_params=env_params,
    schedule_params=SimpleSchedule()
)

graph_manager.improve()

Error:

Creating graph - name: BasicRLGraphManager

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed exec> in <module>

~/.local/lib/python3.5/site-packages/rl_coach/graph_managers/graph_manager.py in improve(self)
    531         """
    532 
--> 533         self.verify_graph_was_created()
    534 
    535         # initialize the network parameters from the global network

~/.local/lib/python3.5/site-packages/rl_coach/graph_managers/graph_manager.py in verify_graph_was_created(self)
    658         """
    659         if self.graph_creation_time is None:
--> 660             self.create_graph()
    661 
    662     def __str__(self):

~/.local/lib/python3.5/site-packages/rl_coach/graph_managers/graph_manager.py in create_graph(self, task_parameters)
    146 
    147         # create the graph modules
--> 148         self.level_managers, self.environments = self._create_graph(task_parameters)
    149 
    150         # set self as the parent of all the level managers

~/.local/lib/python3.5/site-packages/rl_coach/graph_managers/basic_rl_graph_manager.py in _create_graph(self, task_parameters)
     62         self.env_params.experiment_path = task_parameters.experiment_path
     63         env = short_dynamic_import(self.env_params.path)(**self.env_params.__dict__,
---> 64                                                          visualization_parameters=self.visualization_parameters)
     65 
     66         # agent loading

~/.local/lib/python3.5/site-packages/rl_coach/environments/gym_environment.py in __init__(self, level, frame_skip, visualization_parameters, target_success_rate, additional_simulator_parameters, seed, human_control, custom_reward_threshold, random_initialization_steps, max_over_num_frames, observation_space_type, **kwargs)
    270                 # environment in a an absolute path module written as a unix path or in a relative path module
    271                 # written as a python import path
--> 272                 env_class = short_dynamic_import(self.env_id)
    273             else:
    274                 # environment in a python package

~/.local/lib/python3.5/site-packages/rl_coach/utils.py in short_dynamic_import(module_path_and_attribute, ignore_module_case)
    346         """
    347         return dynamic_import_from_full_path(*module_path_and_attribute.split(':'),
--> 348                                              ignore_module_case=ignore_module_case)
    349     else:
    350         """

~/.local/lib/python3.5/site-packages/rl_coach/utils.py in dynamic_import_from_full_path(module_path, class_name, ignore_module_case)
    381                 module_path = '.'.join(module_path.split("/")[:-1] + [curr_module_name])
    382     spec = importlib.util.spec_from_file_location("module", module_path)
--> 383     module = importlib.util.module_from_spec(spec)
    384     spec.loader.exec_module(module)
    385     class_ref = getattr(module, class_name)

/usr/lib/python3.5/importlib/_bootstrap.py in module_from_spec(spec)

AttributeError: 'NoneType' object has no attribute 'loader'
saltypeanuts commented 4 years ago

output for the copy / paste code example: Code:

# Adding module path to sys path if not there, so rl_coach submodules can be imported
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
resources_path = os.path.abspath(os.path.join('Resources'))
if module_path not in sys.path:
    sys.path.append(module_path)
if resources_path not in sys.path:
    sys.path.append(resources_path)
from rl_coach.coach import CoachInterface 

coach = CoachInterface(preset='CartPole_ClippedPPO',
                       # The optional custom_parameter enables overriding preset settings
                       custom_parameter='heatup_steps=EnvironmentSteps(5);improve_steps=TrainingSteps(3)',
                       # Other optional parameters enable easy access to advanced functionalities
                       num_workers=1, checkpoint_save_secs=10)

coach.run()

Output:

]2;
Creating graph - name: BasicRLGraphManager
Creating agent - name: agent

/home/my_user/.local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/my_user/.local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/my_user/.local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/my_user/.local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/my_user/.local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/my_user/.local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /home/my_user/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/my_user/.local/lib/python3.5/site-packages/rl_coach/architectures/tensorflow_components/layers.py:182: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
WARNING:tensorflow:From /home/my_user/.local/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1624: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From /home/my_user/.local/lib/python3.5/site-packages/tensorflow/python/ops/losses/losses_impl.py:667: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /home/my_user/.local/lib/python3.5/site-packages/rl_coach/architectures/tensorflow_components/heads/ppo_head.py:113: Categorical.__init__ (from tensorflow.python.ops.distributions.categorical) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
WARNING:tensorflow:From /home/my_user/.local/lib/python3.5/site-packages/tensorflow/python/ops/distributions/categorical.py:242: Distribution.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
WARNING:tensorflow:From /home/my_user/.local/lib/python3.5/site-packages/rl_coach/architectures/tensorflow_components/heads/ppo_head.py:66: kl_divergence (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
WARNING:tensorflow:From /home/my_user/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
]2;
simple_rl_graph: Starting heatup
Heatup - Name: main_level/agent Worker: 0 Episode: 1 Total reward: 18.0 Steps: 18 Training iteration: 0 
Starting to improve simple_rl_graph task index 0
Training - Name: main_level/agent Worker: 0 Episode: 2 Total reward: 8.0 Steps: 26 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 3 Total reward: 11.0 Steps: 37 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 4 Total reward: 13.0 Steps: 50 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 5 Total reward: 13.0 Steps: 63 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 6 Total reward: 25.0 Steps: 88 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 7 Total reward: 10.0 Steps: 98 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 8 Total reward: 11.0 Steps: 109 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 9 Total reward: 10.0 Steps: 119 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 10 Total reward: 11.0 Steps: 130 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 11 Total reward: 11.0 Steps: 141 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 12 Total reward: 15.0 Steps: 156 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 13 Total reward: 11.0 Steps: 167 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 14 Total reward: 10.0 Steps: 177 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 15 Total reward: 10.0 Steps: 187 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 16 Total reward: 9.0 Steps: 196 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 17 Total reward: 10.0 Steps: 206 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 18 Total reward: 11.0 Steps: 217 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 19 Total reward: 15.0 Steps: 232 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 20 Total reward: 16.0 Steps: 248 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 21 Total reward: 12.0 Steps: 260 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 22 Total reward: 14.0 Steps: 274 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 23 Total reward: 16.0 Steps: 290 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 24 Total reward: 13.0 Steps: 303 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 25 Total reward: 16.0 Steps: 319 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 26 Total reward: 9.0 Steps: 328 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 27 Total reward: 11.0 Steps: 339 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 28 Total reward: 9.0 Steps: 348 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 29 Total reward: 10.0 Steps: 358 Training iteration: 0 

Training - Name: main_level/agent Worker: 0 Episode: 30 Total reward: 9.0 Steps: 367 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 31 Total reward: 8.0 Steps: 375 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 32 Total reward: 10.0 Steps: 385 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 33 Total reward: 15.0 Steps: 400 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 34 Total reward: 10.0 Steps: 410 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 35 Total reward: 14.0 Steps: 424 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 36 Total reward: 13.0 Steps: 437 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 37 Total reward: 11.0 Steps: 448 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 38 Total reward: 12.0 Steps: 460 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 39 Total reward: 19.0 Steps: 479 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 40 Total reward: 13.0 Steps: 492 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 41 Total reward: 11.0 Steps: 503 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 42 Total reward: 12.0 Steps: 515 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 43 Total reward: 9.0 Steps: 524 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 44 Total reward: 9.0 Steps: 533 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 45 Total reward: 9.0 Steps: 542 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 46 Total reward: 16.0 Steps: 558 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 47 Total reward: 9.0 Steps: 567 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 48 Total reward: 13.0 Steps: 580 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 49 Total reward: 15.0 Steps: 595 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 50 Total reward: 11.0 Steps: 606 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 51 Total reward: 9.0 Steps: 615 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 52 Total reward: 12.0 Steps: 627 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 53 Total reward: 10.0 Steps: 637 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 54 Total reward: 13.0 Steps: 650 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 55 Total reward: 12.0 Steps: 662 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 56 Total reward: 11.0 Steps: 673 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 57 Total reward: 13.0 Steps: 686 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 58 Total reward: 11.0 Steps: 697 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 59 Total reward: 9.0 Steps: 706 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 60 Total reward: 12.0 Steps: 718 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 61 Total reward: 9.0 Steps: 727 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 62 Total reward: 11.0 Steps: 738 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 63 Total reward: 13.0 Steps: 751 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 64 Total reward: 10.0 Steps: 761 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 65 Total reward: 9.0 Steps: 770 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 66 Total reward: 11.0 Steps: 781 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 67 Total reward: 12.0 Steps: 793 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 68 Total reward: 14.0 Steps: 807 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 69 Total reward: 9.0 Steps: 816 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 70 Total reward: 13.0 Steps: 829 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 71 Total reward: 14.0 Steps: 843 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 72 Total reward: 17.0 Steps: 860 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 73 Total reward: 10.0 Steps: 870 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 74 Total reward: 12.0 Steps: 882 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 75 Total reward: 10.0 Steps: 892 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 76 Total reward: 11.0 Steps: 903 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 77 Total reward: 14.0 Steps: 917 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 78 Total reward: 8.0 Steps: 925 Training iteration: 0 

Training - Name: main_level/agent Worker: 0 Episode: 79 Total reward: 13.0 Steps: 938 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 80 Total reward: 10.0 Steps: 948 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 81 Total reward: 13.0 Steps: 961 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 82 Total reward: 13.0 Steps: 974 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 83 Total reward: 10.0 Steps: 984 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 84 Total reward: 12.0 Steps: 996 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 85 Total reward: 21.0 Steps: 1017 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 86 Total reward: 10.0 Steps: 1027 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 87 Total reward: 10.0 Steps: 1037 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 88 Total reward: 14.0 Steps: 1051 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 89 Total reward: 10.0 Steps: 1061 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 90 Total reward: 12.0 Steps: 1073 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 91 Total reward: 17.0 Steps: 1090 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 92 Total reward: 11.0 Steps: 1101 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 93 Total reward: 15.0 Steps: 1116 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 94 Total reward: 10.0 Steps: 1126 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 95 Total reward: 17.0 Steps: 1143 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 96 Total reward: 15.0 Steps: 1158 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 97 Total reward: 12.0 Steps: 1170 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 98 Total reward: 11.0 Steps: 1181 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 99 Total reward: 12.0 Steps: 1193 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 100 Total reward: 11.0 Steps: 1204 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 101 Total reward: 13.0 Steps: 1217 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 102 Total reward: 9.0 Steps: 1226 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 103 Total reward: 10.0 Steps: 1236 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 104 Total reward: 13.0 Steps: 1249 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 105 Total reward: 11.0 Steps: 1260 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 106 Total reward: 9.0 Steps: 1269 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 107 Total reward: 11.0 Steps: 1280 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 108 Total reward: 14.0 Steps: 1294 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 109 Total reward: 8.0 Steps: 1302 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 110 Total reward: 13.0 Steps: 1315 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 111 Total reward: 19.0 Steps: 1334 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 112 Total reward: 12.0 Steps: 1346 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 113 Total reward: 14.0 Steps: 1360 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 114 Total reward: 9.0 Steps: 1369 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 115 Total reward: 9.0 Steps: 1378 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 116 Total reward: 11.0 Steps: 1389 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 117 Total reward: 12.0 Steps: 1401 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 118 Total reward: 11.0 Steps: 1412 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 119 Total reward: 12.0 Steps: 1424 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 120 Total reward: 12.0 Steps: 1436 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 121 Total reward: 9.0 Steps: 1445 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 122 Total reward: 10.0 Steps: 1455 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 123 Total reward: 15.0 Steps: 1470 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 124 Total reward: 10.0 Steps: 1480 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 125 Total reward: 10.0 Steps: 1490 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 126 Total reward: 10.0 Steps: 1500 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 127 Total reward: 9.0 Steps: 1509 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 128 Total reward: 9.0 Steps: 1518 Training iteration: 0 
INFO:tensorflow:./experiments/12_08_2019-13_11/checkpoint/0_Step-1500.ckpt is not in all_model_checkpoint_paths. Manually adding it.

Checkpoint - Saving in path: ['./experiments/12_08_2019-13_11/checkpoint/0_Step-1500.ckpt'] 
Training - Name: main_level/agent Worker: 0 Episode: 129 Total reward: 14.0 Steps: 1532 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 130 Total reward: 10.0 Steps: 1542 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 131 Total reward: 11.0 Steps: 1553 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 132 Total reward: 12.0 Steps: 1565 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 133 Total reward: 12.0 Steps: 1577 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 134 Total reward: 14.0 Steps: 1591 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 135 Total reward: 9.0 Steps: 1600 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 136 Total reward: 14.0 Steps: 1614 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 137 Total reward: 10.0 Steps: 1624 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 138 Total reward: 13.0 Steps: 1637 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 139 Total reward: 10.0 Steps: 1647 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 140 Total reward: 11.0 Steps: 1658 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 141 Total reward: 10.0 Steps: 1668 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 142 Total reward: 10.0 Steps: 1678 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 143 Total reward: 14.0 Steps: 1692 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 144 Total reward: 10.0 Steps: 1702 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 145 Total reward: 19.0 Steps: 1721 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 146 Total reward: 12.0 Steps: 1733 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 147 Total reward: 9.0 Steps: 1742 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 148 Total reward: 10.0 Steps: 1752 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 149 Total reward: 11.0 Steps: 1763 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 150 Total reward: 10.0 Steps: 1773 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 151 Total reward: 13.0 Steps: 1786 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 152 Total reward: 10.0 Steps: 1796 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 153 Total reward: 11.0 Steps: 1807 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 154 Total reward: 17.0 Steps: 1824 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 155 Total reward: 10.0 Steps: 1834 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 156 Total reward: 10.0 Steps: 1844 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 157 Total reward: 17.0 Steps: 1861 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 158 Total reward: 9.0 Steps: 1870 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 159 Total reward: 9.0 Steps: 1879 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 160 Total reward: 10.0 Steps: 1889 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 161 Total reward: 17.0 Steps: 1906 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 162 Total reward: 9.0 Steps: 1915 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 163 Total reward: 12.0 Steps: 1927 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 164 Total reward: 16.0 Steps: 1943 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 165 Total reward: 10.0 Steps: 1953 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 166 Total reward: 13.0 Steps: 1966 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 167 Total reward: 11.0 Steps: 1977 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 168 Total reward: 10.0 Steps: 1987 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 169 Total reward: 11.0 Steps: 1998 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 170 Total reward: 13.0 Steps: 2011 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 171 Total reward: 10.0 Steps: 2021 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 172 Total reward: 15.0 Steps: 2036 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 173 Total reward: 11.0 Steps: 2047 Training iteration: 0 
Training - Name: main_level/agent Worker: 0 Episode: 174 Total reward: 10.0 Steps: 2057 Training iteration: 0 
Policy training - Surrogate loss: -0.016741858795285225 KL divergence: 0.01427377387881279 Entropy: 0.679304301738739 training epoch: 0 learning_rate: 0.0003 
Policy training - Surrogate loss: -0.022224819287657738 KL divergence: 0.01635449379682541 Entropy: 0.6825322508811951 training epoch: 1 learning_rate: 0.0003 
Policy training - Surrogate loss: -0.022269727662205696 KL divergence: 0.016053875908255577 Entropy: 0.6814897060394287 training epoch: 2 learning_rate: 0.0003 

Policy training - Surrogate loss: -0.0215325728058815 KL divergence: 0.015581163577735424 Entropy: 0.6800335645675659 training epoch: 3 learning_rate: 0.0003 
Policy training - Surrogate loss: -0.020753931254148483 KL divergence: 0.015484437346458435 Entropy: 0.6795852184295654 training epoch: 4 learning_rate: 0.0003 
Policy training - Surrogate loss: -0.022976987063884735 KL divergence: 0.017357708886265755 Entropy: 0.6813980340957642 training epoch: 5 learning_rate: 0.0003 
Policy training - Surrogate loss: -0.023274868726730347 KL divergence: 0.017804931849241257 Entropy: 0.6819442510604858 training epoch: 6 learning_rate: 0.0003 
Policy training - Surrogate loss: -0.023724138736724854 KL divergence: 0.0179484561085701 Entropy: 0.6814436912536621 training epoch: 7 learning_rate: 0.0003 
Policy training - Surrogate loss: -0.02237250655889511 KL divergence: 0.01311410777270794 Entropy: 0.6739133596420288 training epoch: 8 learning_rate: 0.0003 
Policy training - Surrogate loss: -0.024333715438842773 KL divergence: 0.017627976834774017 Entropy: 0.6793736815452576 training epoch: 9 learning_rate: 0.0003 
agent: Starting evaluation phase
Testing - Name: main_level/agent Worker: 0 Episode: 174 Total reward: 10.0 Steps: 2066 Training iteration: 1 
Testing - Name: main_level/agent Worker: 0 Episode: 174 Total reward: 10.0 Steps: 2066 Training iteration: 1 
Testing - Name: main_level/agent Worker: 0 Episode: 174 Total reward: 11.0 Steps: 2066 Training iteration: 1 
Testing - Name: main_level/agent Worker: 0 Episode: 174 Total reward: 10.0 Steps: 2066 Training iteration: 1 
Testing - Name: main_level/agent Worker: 0 Episode: 174 Total reward: 10.0 Steps: 2066 Training iteration: 1 
agent: Finished evaluation phase. Success rate = 0.0, Avg Total Reward = 10.2
saltypeanuts commented 4 years ago

The other example I'm working off of also works:

Code:

from rl_coach.agents.clipped_ppo_agent import ClippedPPOAgentParameters
from rl_coach.environments.gym_environment import GymVectorEnvironment
from rl_coach.graph_managers.basic_rl_graph_manager import BasicRLGraphManager
from rl_coach.graph_managers.graph_manager import SimpleSchedule
from rl_coach.architectures.embedder_parameters import InputEmbedderParameters

# define the environment parameters
bit_length = 10
env_params = GymVectorEnvironment(level='rl_coach.environments.toy_problems.bit_flip:BitFlip')
env_params.additional_simulator_parameters = {'bit_length': bit_length, 'mean_zero': True}

# Clipped PPO
agent_params = ClippedPPOAgentParameters()
agent_params.network_wrappers['main'].input_embedders_parameters = {
    'state': InputEmbedderParameters(scheme=[]),
    'desired_goal': InputEmbedderParameters(scheme=[])
}

graph_manager = BasicRLGraphManager(
    agent_params=agent_params,
    env_params=env_params,
    schedule_params=SimpleSchedule()
)

graph_manager.improve()

Some output:

Training - Name: main_level/agent Worker: 0 Episode: 4184 Total reward: -10 Steps: 41681 Training iteration: 20 
Training - Name: main_level/agent Worker: 0 Episode: 4185 Total reward: -10 Steps: 41691 Training iteration: 20 
Training - Name: main_level/agent Worker: 0 Episode: 4186 Total reward: -10 Steps: 41701 Training iteration: 20 
Training - Name: main_level/agent Worker: 0 Episode: 4187 Total reward: -10 Steps: 41711 Training iteration: 20 
Training - Name: main_level/agent Worker: 0 Episode: 4188 Total reward: -10 Steps: 41721 Training iteration: 20 
saltypeanuts commented 4 years ago

Environment looks like:

class my_env(gym.Env):
    def __init__(self, env_config):
        #intialize environment

    def reset(self):
        #procedures for resetting the environment
        return obs

    def _next_observation(self):
        #gets an observation
        return obs

    def step(self, action):
        #takes an action and acts on it
        return obs, reward, done, {}

    def _take_action(self, action):
        #updates environmental variables as appropriate

    #def render(self, mode = 'human', close = False)
        #prints out environment state
        #doesn't matter if it's commented in or out