astooke / rlpyt

Reinforcement Learning in PyTorch
MIT License
2.21k stars 323 forks source link

Doesn't work with (non-atari) env #164

Open drozzy opened 4 years ago

drozzy commented 4 years ago

It would be super useful for me to see an example of how to use a custom gym environment. Is there an example of this somewhere?

The problem with built-in atari environment is I’m not sure where rlpyt begins and environment ends.

One thing I find a bit confusing is the _infodict. It’s not clear to me at which point I have to wrap it (or does the env wrapper wrap it automatically)?

Let's say we had a simple env like:

class DummyEnv(gym.Env):
   def __init__(self):
        self.action_space = spaces.Discrete(2)
        self.observation_space = spaces.Discrete(10)

   def reset(self):
        return 0

   def step(self, action):
        obs, rew, done, info = 0, 1, True, {}
        return obs, rew, done, info

what are the steps I would need to take to wrap it?

LecJackS commented 4 years ago

Not sure this is what you're looking for, just started exploring rlpyt, but in my case I defined a custom env as a class like you did, and then pass it to the serial sampler, the same way is done in example 1:

 sampler = SerialSampler( 
        EnvCls=DummyEnv,
        env_kwargs=dict(mode="train", some_other_params=None),
        eval_env_kwargs=dict(mode="test", some_other_params=None),
        ... )

In my case, I returned "None" as info_dict as it is always empty.

About wrapping the entire env:

Output env_info is automatically converted from a dictionary to a corresponding namedtuple, which the rlpyt sampler expects. For this to work, every key that might appear in the gym environments env_info at any step must appear at the first step after a reset, as the env_info entries will have sampler memory pre-allocated for them (so they also cannot change dtype or shape). (see EnvInfoWrapper, build_info_tuples, and info_to_nt in file or more help/details)

https://rlpyt.readthedocs.io/en/latest/pages/env.html#rlpyt.envs.gym.make

Examples on this can be found in files importing this line

from rlpyt.envs.gym import make

drozzy commented 4 years ago

Awesome, thanks for the tip.

So, just to be clear, you don't use make? You just return None for info?

drozzy commented 4 years ago

Ok, I'm getting this error now if I just pass the env class directly:

    agent.initialize(envs[0].spaces, share_memory=False,
AttributeError: 'DummyEnv' object has no attribute 'spaces'

Here is my full source:

import gym
from rlpyt.samplers.serial.sampler import SerialSampler
from rlpyt.agents.dqn.dqn_agent import DqnAgent
from rlpyt.algos.dqn.dqn import DQN
from rlpyt.envs.gym import GymEnvWrapper
from rlpyt.runners.minibatch_rl import MinibatchRlEval

def main(run_ID, cuda_idx):
    agent = DqnAgent()
    algo = DQN() 
    sampler = SerialSampler(
        EnvCls=DummyEnv,
        env_kwargs={},
        batch_T=10, # Timesteps per sample batch
        batch_B=1,  # Num environments to run in parallel
        max_decorrelation_steps=0
    )

    runner = MinibatchRlEval(
        algo=algo,
        agent=agent,
        sampler=sampler,
        n_steps=1000
    )

    runner.train()

class DummyEnv(gym.Env):
    """
    Runs env for 100 steps returning 0 reward except the last step returns 1
    """
    def __init__(self):

        self.n = 100
        self.action_space = gym.spaces.Discrete(2)
        self.observation_space = gym.spaces.Discrete(10)

    def reset(self):
        self.n = 100
        return 0

    def step(self, action):
        obs = 1
        self.n -= 1
        done = self.n <= 0
        rew = 1 if done else 0
        # info = {}

        return obs, rew, done, None

if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--run_ID', help='run identifier (logging)', type=int, default=0)
    parser.add_argument('--cuda_idx', help='gpu to use', type=int, default=None)
    args = parser.parse_args()
    main(
        run_ID=args.run_ID,
        cuda_idx=args.cuda_idx
    )
drozzy commented 4 years ago

https://rlpyt.readthedocs.io/en/latest/pages/env.html#rlpyt.envs.gym.make

Examples on this can be found in files importing this line

from rlpyt.envs.gym import make

This seems to work only with environments that are registered with gym. :( https://github.com/astooke/rlpyt/blob/ca6483323c1ec372e9b4ec0ecde47bba620391d8/rlpyt/envs/gym.py#L163

drozzy commented 4 years ago

@astooke this should be a pretty simple use case. Could you give me a hint?

Thanks!

LecJackS commented 4 years ago

Ok, I'm getting this error now if I just pass the env class directly:

    agent.initialize(envs[0].spaces, share_memory=False,
AttributeError: 'DummyEnv' object has no attribute 'spaces'

I forgot about spaces. Try adding:

# From rlpyt/envs/gym
    @property
    def spaces(self):
        """Returns the rlpyt spaces for the wrapped env."""
        return EnvSpaces(
            observation=self.observation_space,
            action=self.action_space,
        )

To your env, so it can return the spaces dimensions as needed.

class DummyEnv(gym.Env):
    """
    Runs env for 100 steps returning 0 reward except the last step returns 1
    """
    def __init__(self):

        self.n = 100
        self.action_space = gym.spaces.Discrete(2)
        self.observation_space = gym.spaces.Discrete(10)

    # From rlpyt/envs/gym
    @property
    def spaces(self):
        """Returns the rlpyt spaces for the wrapped env."""
        return EnvSpaces(
            observation=self.observation_space,
            action=self.action_space,
        )

    def reset(self):
        self.n = 100
        return 0

    def step(self, action):
        obs = 1
        self.n -= 1
        done = self.n <= 0
        rew = 1 if done else 0
        # info = {}

        return obs, rew, done, None
drozzy commented 4 years ago

Nope, that doesn't work either ;-(

myproj) andriy@whitelinux:~/Projects/myproj$ python 1_main.py 
2020-05-27 14:44:03.023122  | Runner  master CPU affinity: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15].
2020-05-27 14:44:03.023173  | Runner  master Torch threads: 8.
using seed 2852
Traceback (most recent call last):
  File "1_main.py", line 69, in <module>
    cuda_idx=args.cuda_idx
  File "1_main.py", line 26, in main
    runner.train()
  File "/home/andriy/Projects/myproj/src/rlpyt/rlpyt/runners/minibatch_rl.py", line 301, in train
    n_itr = self.startup()
  File "/home/andriy/Projects/myproj/src/rlpyt/rlpyt/runners/minibatch_rl.py", line 81, in startup
    world_size=world_size,
  File "/home/andriy/Projects/myproj/src/rlpyt/rlpyt/samplers/serial/sampler.py", line 51, in initialize
    global_B=global_B, env_ranks=env_ranks)
  File "/home/andriy/Projects/myproj/src/rlpyt/rlpyt/agents/dqn/dqn_agent.py", line 37, in initialize
    global_B=global_B, env_ranks=env_ranks)
  File "/home/andriy/Projects/myproj/src/rlpyt/rlpyt/agents/base.py", line 84, in initialize
    **self.model_kwargs)
TypeError: 'NoneType' object is not callable
(myproj) andriy@whitelinux:~/Projects/myproj$ 

here's my full source again:

from rlpyt.samplers.serial.sampler import SerialSampler
from rlpyt.agents.dqn.dqn_agent import DqnAgent
from rlpyt.algos.dqn.dqn import DQN
from rlpyt.envs.gym import GymEnvWrapper
from rlpyt.runners.minibatch_rl import MinibatchRlEval
from rlpyt.envs.gym import EnvSpaces

def main(run_ID, cuda_idx):
    agent = DqnAgent()
    algo = DQN() 
    sampler = SerialSampler(
        EnvCls=DummyEnv,
        env_kwargs={},
        batch_T=10, # Timesteps per sample batch
        batch_B=1,  # Num environments to run in parallel
        max_decorrelation_steps=0
    )

    runner = MinibatchRlEval(
        algo=algo,
        agent=agent,
        sampler=sampler,
        n_steps=1000
    )

    runner.train()

import gym

class DummyEnv(gym.Env):
    """
    Runs env for 100 steps returning 0 reward except the last step returns 1
    """
    def __init__(self):

        self.n = 100
        self.action_space = gym.spaces.Discrete(2)
        self.observation_space = gym.spaces.Discrete(10)

    # From rlpyt/envs/gym
    @property
    def spaces(self):
        """Returns the rlpyt spaces for the wrapped env."""
        return EnvSpaces(
            observation=self.observation_space,
            action=self.action_space,
        )

    def reset(self):
        self.n = 100
        return 0

    def step(self, action):
        obs = 1
        self.n -= 1
        done = self.n <= 0
        rew = 1 if done else 0

        return obs, rew, done, None

if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--run_ID', help='run identifier (logging)', type=int, default=0)
    parser.add_argument('--cuda_idx', help='gpu to use', type=int, default=None)
    args = parser.parse_args()
    main(
        run_ID=args.run_ID,
        cuda_idx=args.cuda_idx
    )
MauriceManning commented 4 years ago

Hello, Im trying to get the recent release of the Nethack gym environment by Facebook working in the rlpyt framework but having issues as well. Wondering if anyone has experimented with this env. thanks

frankie4fingers commented 4 years ago

Actually if we're modify example_3.py to work with custom gym envs so we'll see that only serial sampler works as expected. All parallel samplers fails since they try to initialize 'info' globals in base process, but try to receive it from childs globals, wich is empty.

drozzy commented 4 years ago

Bump

benman1 commented 4 years ago

I have the same problem with a simple example based on one of the examples in the repo. It'd be good to have more documentation on how to do this (if it works). I've tried different other combinations of methods without success such as using gym_make directly.

from rlpyt.samplers.serial.sampler import SerialSampler
from rlpyt.algos.dqn.dqn import DQN
from rlpyt.agents.dqn.catdqn_agent import CatDqnAgent
from rlpyt.runners.minibatch_rl import MinibatchRlEval
import gym
from rlpyt.envs.gym import GymEnvWrapper

def make_env(game):
    return GymEnvWrapper(gym.make(game))

sampler = SerialSampler(
    EnvCls=make_env,
    env_kwargs={'game': 'CartPole-v1'},
    batch_T=1,
    batch_B=1,
)
algo = DQN(min_steps_learn=1e3)
agent = CatDqnAgent()

runner = MinibatchRlEval(
    algo=algo,
    agent=agent,
    sampler=sampler,
    n_steps=500,
)
config = dict(game=game)
runner.train()

2020-06-17 16:50:34.878147  | dqn_pong_0 dqn_pong_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 Runner  master CPU affinity: [0, 1, 2, 3, 4, 5].
2020-06-17 16:50:34.880999  | dqn_pong_0 dqn_pong_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 Runner  master Torch threads: 3.
using seed 3474
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-6a9a56aa8b66> in <module>
     38 )
     39 config = dict(game=game)
---> 40 runner.train()

~/anaconda3/lib/python3.7/site-packages/rlpyt/runners/minibatch_rl.py in train(self)
    299         specified log interval.
    300         """
--> 301         n_itr = self.startup()
    302         with logger.prefix(f"itr #0 "):
    303             eval_traj_infos, eval_time = self.evaluate_agent(0)

~/anaconda3/lib/python3.7/site-packages/rlpyt/runners/minibatch_rl.py in startup(self)
     79             traj_info_kwargs=self.get_traj_info_kwargs(),
     80             rank=rank,
---> 81             world_size=world_size,
     82         )
     83         self.itr_batch_size = self.sampler.batch_spec.size * world_size

~/anaconda3/lib/python3.7/site-packages/rlpyt/samplers/serial/sampler.py in initialize(self, agent, affinity, seed, bootstrap_value, traj_info_kwargs, rank, world_size)
     49         env_ranks = list(range(rank * B, (rank + 1) * B))
     50         agent.initialize(envs[0].spaces, share_memory=False,
---> 51             global_B=global_B, env_ranks=env_ranks)
     52         samples_pyt, samples_np, examples = build_samples_buffer(agent, envs[0],
     53             self.batch_spec, bootstrap_value, agent_shared=False,

~/anaconda3/lib/python3.7/site-packages/rlpyt/agents/dqn/catdqn_agent.py in initialize(self, env_spaces, share_memory, global_B, env_ranks)
     21     def initialize(self, env_spaces, share_memory=False,
     22             global_B=1, env_ranks=None):
---> 23         super().initialize(env_spaces, share_memory, global_B, env_ranks)
     24         # Overwrite distribution.
     25         self.distribution = CategoricalEpsilonGreedy(dim=env_spaces.action.n,

~/anaconda3/lib/python3.7/site-packages/rlpyt/agents/dqn/dqn_agent.py in initialize(self, env_spaces, share_memory, global_B, env_ranks)
     35         environment instance."""
     36         super().initialize(env_spaces, share_memory,
---> 37             global_B=global_B, env_ranks=env_ranks)
     38         self.target_model = self.ModelCls(**self.env_model_kwargs,
     39             **self.model_kwargs)

~/anaconda3/lib/python3.7/site-packages/rlpyt/agents/base.py in initialize(self, env_spaces, share_memory, **kwargs)
     82         self.env_model_kwargs = self.make_env_to_model_kwargs(env_spaces)
     83         self.model = self.ModelCls(**self.env_model_kwargs,
---> 84             **self.model_kwargs)
     85         if share_memory:
     86             self.model.share_memory()

TypeError: 'NoneType' object is not callable
jarlva commented 4 years ago

Having same issue above: 'NoneType' object is not callable

This issue is a month old. It would benefit new folks like me who are interested in adopting rlpynt and use non-Atari Gym. So, can someone please give a simple, yet complete, Cartpole-v1 example?

Thanks!

astooke commented 4 years ago

Hi! Sorry for the long absence...let me try to help sort through these...

@benman1 The problem in your case is that when the agent tries to initialize the model (neural net), it doesn't have a self.ModelCls to call. The CatDqnAgent doesn't come with one of these, but the AtariCatDqnAgent is an example that has the model specific to the Atari environment.

astooke commented 4 years ago

@drozzy @LecJackS If you are making a custom env, it is better to just use the rlpyt base env class (https://github.com/astooke/rlpyt/blob/master/rlpyt/envs/base.py), and follow that interface. No need to go through gym. The main difference is that the env_info that your environment returns should be a namedtuple, not a dict, and the entries should be scalars or numpy arrays which are the same dtype and shape at every environment step (even if you just have to fill with zeros when not using it).

If you have an environment that's already registered in gym, then you can use the wrapper as provided in https://github.com/astooke/rlpyt/blob/master/rlpyt/envs/gym.py and shown in example_2.py, where you use the gym_make factory function as the EnvCls argument: https://github.com/astooke/rlpyt/blob/85d4e018a919118c6e42fac3e897aa346d84b9c5/examples/example_2.py#L23

Hopefully that helps?

astooke commented 4 years ago

@frankie4fingers that's an unexpected problem! Could you provide more details? The environment should be instantiated separately within each child process in the parallel samplers.

jarlva commented 4 years ago

a simple, yet complete, cartpole code example is welcome!

"A sample code is worth a thousand responses"

benman1 commented 4 years ago

@astooke that's very helpful, thanks for that!

Le mar. 30 juin 2020 à 17:12, astooke notifications@github.com a écrit :

Hi! Sorry for the long absence...let me try to help sort through these...

@benman1 https://github.com/benman1 The problem in your case is that when the agent tries to initialize the model (neural net), it doesn't have a self.ModelCls to call. The CatDqnAgent doesn't come with one of these, but the AtariCatDqnAgent is an example that has the model specific to the Atari environment.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/astooke/rlpyt/issues/164#issuecomment-651895787, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSJO7EPYWYXFB3CFFM5URDRZIFG5ANCNFSM4NLCUXWA .

MauriceManning commented 4 years ago

Hello Ive been trying to modify example2.py to work with Facebook Nethack. I am able to load the environment 'NetHack-v0' via the gym wrapper (gym_make in the SerialSampler) but it seems that the structures returned from NetHack are not in the correct form? Is the correct approach here to go into the NetHack code and/or gym wrapper code and adjust the how the data is returned to the wrapper? Please see attached screenshot, you can see that much of the structure is missing.

Screenshot 2020-07-02 at 10 43 44 AM

thanks for any ideas.

im-ant commented 4 years ago

@benman1 The problem in your case is that when the agent tries to initialize the model (neural net), it doesn't have a self.ModelCls to call. The CatDqnAgent doesn't come with one of these, but the AtariCatDqnAgent is an example that has the model specific to the Atari environment.

Does this mean there is currently no way to use the CatDqnAgent with a non-atari environment?

If so, what additional files do we need to get a C51 agent to work with a custom (non-atari) gym environment? (presumably we have to write our own ModelCls class, anything else?)

frankie4fingers commented 3 years ago

@astooke sorry for delay I used code from example3.py and sampler is GpuSampler. It works as expected in serial mode and fails in parallel. As I remember main process do first step, and make info and save other related stuff in main globals, and then all child parallel instances, which not have such content tries to get it and stops on build_info_tuples(info) in GymEnvWrapper constructor for all child process (since ntc = globals().get(name) exists only on main process). So my workaround for now is to disable # build_info_tuples(info) and provide it separatelly for each parallel instance Sampler(EnvCls=gym_make, env_kwargs=dict(id=env_id, info_example=dict(timeout=2000)). It works fine for me.

astooke commented 3 years ago

@im-ant Yes the main thing to get C51 working with a custom environment is just to write your own model class. Or maybe your environment has the same observation and action spaces as Atari, in which case you could just use the same model, but maybe you want different default conv hyperparameters or something like that.

astooke commented 3 years ago

@frankie4fingers OK thanks for explaining the problem and the quick workaround. I'm still a bit surprised by this, because I've run gym envs in parallel before. And when the child process looks for ntc=global().get(name) and it's not there, it should end up with ntc=None and build it within it's own module globals...hmm OK i'll give example3.py a run and see..