SforAiDl / genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL
https://genrl.readthedocs.io
MIT License
403 stars 59 forks source link

Examples #281

Closed Sharad24 closed 3 years ago

Sharad24 commented 4 years ago

Examples are broken and imports need to be fixed after #256

Thanks, @DarylRodrigo and @ajaysub110 for pointing this out.

DarylRodrigo commented 4 years ago

Happy for this to be assigned to me 💯

ajaysub110 commented 4 years ago

Great! Assigning this to you

DarylRodrigo commented 4 years ago

Hey all!

I've been looking into this and fixed most of the problems I could find but I am stuck on two.

The first error is in deep_cb.py: when running this file there is an error on importing data from 'data/Covertype'. I'm not actually sure what this is used for or how this works, so a pointer here would be useful.

Stack Trace

Traceback (most recent call last):
  File "deep_cb.py", line 106, in <module>
    main(args)
  File "deep_cb.py", line 19, in main
    run_single_algos_on_bandit(args)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/bandit/run.py", line 116, in run_single_algos_on_bandit
    bandit = bandit_class(download=args.download)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/bandit/bandits/data_bandits/covertype_bandit.py", line 56, in __init__
    self._df = fetch_covertype_data(path)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/bandit/bandits/data_bandits/covertype_bandit.py", line 105, in fetch_covertype_data
    return fetch_zipped_data_without_header(path)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/bandit/bandits/data_bandits/utils.py", line 81, in fetch_zipped_data_without_header
    with gzip.open(gz_fpath, "rb") as f_in:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 53, in open
    binary_file = GzipFile(filename, gz_mode, compresslevel)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 163, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'data/Covertype'

The second problem is when running genetic_rl.py: it looks like the

get_env_properties(env: Union[gym.Env, VecEnv], network: Union[str, Any] = "mlp") -> (Tuple[int]):

function in utils.py is receiving the wrong network type. It's expecting a string and is passed the actual pyTorch network model. I think this might have been an error introduced post refactor (not sure). I could rewrite the whole example but I'm not sure what the plans of further development are like and I'd rather not do redundant work.

Stack Trace

Traceback (most recent call last):
  File "genetic_rl.py", line 217, in <module>
    main(args)
  File "genetic_rl.py", line 139, in main
    generic_agent = A2C(network, env, rollout_size=args.rollout_size)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/agents/deep/a2c/a2c.py", line 59, in __init__
    self._create_model()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/agents/deep/a2c/a2c.py", line 67, in _create_model
    self.env, self.network
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/utils/utils.py", line 155, in get_env_properties
    raise TypeError
TypeError
(base) 

Network variable provided:

MlpActorCritic(
  (actor): MlpPolicy(
    (model): Sequential(
      (0): Linear(in_features=4, out_features=1, bias=True)
      (1): ReLU()
      (2): Linear(in_features=1, out_features=1, bias=True)
      (3): ReLU()
      (4): Linear(in_features=1, out_features=2, bias=True)
      (5): Identity()
    )
  )
  (critic): MlpValue(
    (model): Sequential(
      (0): Linear(in_features=4, out_features=1, bias=True)
      (1): ReLU()
      (2): Linear(in_features=1, out_features=1, bias=True)
      (3): ReLU()
      (4): Linear(in_features=1, out_features=1, bias=True)
      (5): Identity()
    )
  )
)
DarylRodrigo commented 4 years ago

As a side note - having worked with quite a few RL libraries over the last few years, comparatively, I found it quite challenging on getting started with the current example set up. I think if you want beginners to easily and quickly get started you might want to consider having a) a guide on how the classes fit together b) examples which are broken down step by step, rather than a file where you can run examples with args.

sampreet-arthi commented 4 years ago

Cc: @threewisemonkeys-as for 1 deep cb and @mehulrastogi for 2 genetic rl

Sharad24 commented 4 years ago

Hey all!

I've been looking into this and fixed most of the problems I could find but I am stuck on two.

The first error is in deep_cb.py: when running this file there is an error on importing data from 'data/Covertype'. I'm not actually sure what this is used for or how this works, so a pointer here would be useful.

Stack Trace

Traceback (most recent call last):
  File "deep_cb.py", line 106, in <module>
    main(args)
  File "deep_cb.py", line 19, in main
    run_single_algos_on_bandit(args)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/bandit/run.py", line 116, in run_single_algos_on_bandit
    bandit = bandit_class(download=args.download)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/bandit/bandits/data_bandits/covertype_bandit.py", line 56, in __init__
    self._df = fetch_covertype_data(path)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/bandit/bandits/data_bandits/covertype_bandit.py", line 105, in fetch_covertype_data
    return fetch_zipped_data_without_header(path)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/bandit/bandits/data_bandits/utils.py", line 81, in fetch_zipped_data_without_header
    with gzip.open(gz_fpath, "rb") as f_in:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 53, in open
    binary_file = GzipFile(filename, gz_mode, compresslevel)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 163, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'data/Covertype'

Can you try running python deep_cb.py --download? That should download the dataset. I think that should resolve it.

sampreet-arthi commented 4 years ago

As a side note - having worked with quite a few RL libraries over the last few years, comparatively, I found it quite challenging on getting started with the current example set up. I think if you want beginners to easily and quickly get started you might want to consider having a) a guide on how the classes fit together b) examples which are broken down step by step, rather than a file where you can run examples with args.

Hey @DarylRodrigo we actually have docs for GenRL. You can find the link on the README. It isn't very complete yet and there's some stuff like explanation of how the classes fit together that's missing. We'll be tackling that soon.

sampreet-arthi commented 4 years ago

Here's a rudimentary explanation:

Every RL problem has an environment, an agent and some sort of training process. We tried to abstract the process in order to make code shorter.

The following is an explanation for the deep agents (i haven't worked on bandits and evolutionary is still a little new)

So now you have a VectorEnv which is basically a wrapper for vectorized gym envs.

Then you have an Agent. Every Agent inherits from either OnPolicyAgent or OffPolicyAgent both of which inherit from a BaseAgent class. Now the algorithms all have a certain set of methods that perform some function. Like for example OffPolicyAgents all have get_q_values , get_target_q_values etc. Every deep off policy algo has the same subparts just like every agent (not just deep) must have a method to select their action.

The trainers are again similar to the agent classes. They mainly differ in their training loops.

Sharad24 commented 4 years ago

The second problem is when running genetic_rl.py: it looks like the

get_env_properties(env: Union[gym.Env, VecEnv], network: Union[str, Any] = "mlp") -> (Tuple[int]):

function in utils.py is receiving the wrong network type. It's expecting a string and is passed the actual pyTorch network model. I think this might have been an error introduced post refactor (not sure). I could rewrite the whole example but I'm not sure what the plans of further development are like and I'd rather not do redundant work.

Stack Trace

Traceback (most recent call last):
  File "genetic_rl.py", line 217, in <module>
    main(args)
  File "genetic_rl.py", line 139, in main
    generic_agent = A2C(network, env, rollout_size=args.rollout_size)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/agents/deep/a2c/a2c.py", line 59, in __init__
    self._create_model()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/agents/deep/a2c/a2c.py", line 67, in _create_model
    self.env, self.network
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/genrl/utils/utils.py", line 155, in get_env_properties
    raise TypeError
TypeError
(base) 

I'm unable to reproduce this error, but it seems like its running into the final condition (i.e., it is not an object of BasePolicy, BaseValue, BaseActorCritic, "mlp" or "cnn").

Network variable provided:

MlpActorCritic(
  (actor): MlpPolicy(
    (model): Sequential(
      (0): Linear(in_features=4, out_features=1, bias=True)
      (1): ReLU()
      (2): Linear(in_features=1, out_features=1, bias=True)
      (3): ReLU()
      (4): Linear(in_features=1, out_features=2, bias=True)
      (5): Identity()
    )
  )
  (critic): MlpValue(
    (model): Sequential(
      (0): Linear(in_features=4, out_features=1, bias=True)
      (1): ReLU()
      (2): Linear(in_features=1, out_features=1, bias=True)
      (3): ReLU()
      (4): Linear(in_features=1, out_features=1, bias=True)
      (5): Identity()
    )
  )
)

Based on this network object it seems like its an object of BaseActorCritic. So I'm unsure why this is running into a TypeError. Could you post the definition of utils in your installed version of the library? Maybe its missing a few statements. For reference this is what I have:

def get_env_properties(
    env: Union[gym.Env, VecEnv], network: Union[str, Any] = "mlp"
) -> (Tuple[int]):
    """
        Finds important properties of environment

        :param env: Environment that the agent is interacting with
        :type env: Gym Environment
        :param network: Type of network architecture, eg. "mlp", "cnn"
        :type network: str
        :returns: (State space dimensions, Action space dimensions,
    discreteness of action space and action limit (highest action value)
        :rtype: int, float, ...; int, float, ...; bool; int, float, ...
    """
    if network == "cnn":
        state_dim = env.framestack
    elif network == "mlp":
        state_dim = env.observation_space.shape[0]
    elif isinstance(network, (BasePolicy, BaseValue)):
        state_dim = network.state_dim
    elif isinstance(network, BaseActorCritic):
        state_dim = network.actor.state_dim
    else:
        raise TypeError

Although I reckon the functions being different might not be the issue.

As a side note - having worked with quite a few RL libraries over the last few years, comparatively, I found it quite challenging on getting started with the current example set up. I think if you want beginners to easily and quickly get started you might want to consider having a) a guide on how the classes fit together b) examples which are broken down step by step, rather than a file where you can run examples with args.

You're definitely right. I think the issue is that we don't have a clear line between "examples" and "tutorials" as of now. The current examples are clearly more like executables. There's a bunch of "examples" that are missing and could be possible be done:

Sharad24 commented 4 years ago

As a side note - having worked with quite a few RL libraries over the last few years, comparatively, I found it quite challenging on getting started with the current example set up. I think if you want beginners to easily and quickly get started you might want to consider having a) a guide on how the classes fit together b) examples which are broken down step by step, rather than a file where you can run examples with args.

Hey @DarylRodrigo we actually have docs for GenRL. You can find the link on the README. It isn't very complete yet and there's some stuff like explanation of how the classes fit together that's missing. We'll be tackling that soon.

I think it would be good to post our plans on Projects so everyone can take a look and see what the plans are. Probably even take up issues and projects.

As a side note - having worked with quite a few RL libraries over the last few years, comparatively, I found it quite challenging on getting started with the current example set up.

GIven your experience, it would be really helpful if you could help us in planning and making changes in our library. As of now, we're only at v0.0.2, so we're open to big and bold ideas as well. Needless to say, at the same time we're studying other projects too, so that we know whats the best way to create such a library.

Sharad24 commented 4 years ago

Also, you might need a statement agent.env = environment if you get an issue of AssertionError: Cannot call env.step() before calling reset().

This is a known issue at the moment #268

DarylRodrigo commented 4 years ago

@sampreet-arthi

Thanks for the explanation and pointing me to the documentation - I had read them but I still found that they aren't particularly beginner-friendly, especially if you're coming from an "I'm interested in DRL but I've never really done anything other than writing a neural net in PyTorch" perspective. For example, the intro into policy and value function is explained in one sentence and there is no reference to on or off policy or how they interact with the larger system (just my opinion though). Then again, I am new to this open-source project and am unsure of the exact priorities/objectives and as you said, it's still a work in progress :)

(By the way, please don't take this in any way as negative feedback or criticism - I really like what you're working on and it's just my an opinion 😁)

@Sharad24

Can you try running python deep_cb.py --download? That should download the dataset. I think that should resolve it.

This worked in downloading the required data set 👍.

Regarding the get_env_properties issue - after having pulled from the main repo again it seemed to have resolved that particular issue. Thanks for the checking though!


I think it would be good to post our plans on Projects so everyone can take a look and see what the plans are. Probably even take up issues and projects.

This is a great idea :)

Given your experience, it would be really helpful if you could help us in planning and making changes in our library. As of now, we're only at v0.0.2, so we're open to big and bold ideas as well.

I suggest moving this conversation to slack, but as I said before, keen to help out. Especially interested in seeing if this library is the right fit for tutorials starting from the Bellman equation, to MDPs to Q-learning.

Sharad24 commented 4 years ago

@sampreet-arthi

Thanks for the explanation and pointing me to the documentation - I had read them but I still found that they aren't particularly beginner-friendly, especially if you're coming from an "I'm interested in DRL but I've never really done anything other than writing a neural net in PyTorch" perspective. For example, the intro into policy and value function is explained in one sentence and there is no reference to on or off policy or how they interact with the larger system (just my opinion though). Then again, I am new to this open-source project and am unsure of the exact priorities/objectives and as you said, it's still a work in progress :)

(By the way, please don't take this in any way as negative feedback or criticism - I really like what you're working on and it's just my an opinion 😁)

One of our prime goals is to improve accessibility so feedback regarding that is always welcome. Thanks! Raising an issue here. #296

I suggest moving this conversation to slack, but as I said before, keen to help out. Especially interested in seeing if this library is the right fit for tutorials starting from the Bellman equation, to MDPs to Q-learning.

Great. But since you pointed out Bellman equations is missing tacking it here #297

Sharad24 commented 4 years ago

I think we might need a better way to track things on the accessibility end, as there are some gaps that need to be filled. I've created its own label for issues, but may create a project for it too, as it needs one.

We could follow and dug up things from Sutton+Barto, this, etc

DarylRodrigo commented 4 years ago

@Sharad24 Please advise

Context: I'm trying to debug what's happening in the example genetic_rl.py Problem: Program crashes saying that the env must be reset before calling step

Currently: genetic_rl starts generate -> train_population which attempts to train the trainer trainer.train()

In the train function in the onPolicy.py class there is a resetting of the environment through state = self.env.reset() (commented out in the code below). To collect the rollouts from the agent it requires the environment to be triggered which directly starts to step over the env. However the env that the agent references isn't the same env that is being reset in the onPolicy object.

Below is a simple solution (reset the env of the agent), but I don't know if this interferes with other bits of code.

def train(self) -> None:
        """Main training method"""
        if self.load_model is not None:
            self.load()

        for epoch in range(self.epochs):
            self.agent.epoch_reward = np.zeros(self.env.n_envs)

            self.agent.rollout.reset()

            # state = self.env.reset()
            state = self.agent.env.reset()

            values, done = self.agent.collect_rollouts(state)

            self.agent.get_traj_loss(values, done)

            self.agent.update_params()

LMKWYT

Sharad24 commented 4 years ago

Its great that you found this :) We have an issue tracking this #268

       # state = self.env.reset()
      state = self.agent.env.reset()

This fix could potentially interfere with rest of the code. At some point, we'd want the agent to be independent of any env dependency.

I'd say the easiest fix you could do for this now is just agent.env = environment object in train_population function

github-actions[bot] commented 3 years ago

Stale issue message