SforAiDl / genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL
https://genrl.readthedocs.io
MIT License
403 stars 59 forks source link

VPG not working after #294 #316

Closed Devanshu24 closed 4 years ago

Devanshu24 commented 4 years ago

After the refactoring that was done, I thought changing the tutorial accordingly to this

import gym  # OpenAI Gym

from genrl.agents import VPG
from genrl.trainers import OnPolicyTrainer
from genrl.environments import VectorEnv

env = VectorEnv("CartPole-v1")
agent = VPG('mlp', env)
trainer = OnPolicyTrainer(agent, env, epochs=200)
trainer.train()

Upon running this code I get the following error

Traceback (most recent call last):
  File "/home/think__tech/Desktop/genrl/temp.py", line 10, in <module>
    trainer.train()
  File "/home/think__tech/Desktop/genrl/genrl/trainers/onpolicy.py", line 46, in train
    self.agent.get_traj_loss(values, done)
  File "/home/think__tech/Desktop/genrl/genrl/agents/deep/vpg/vpg.py", line 114, in get_traj_loss
    self.rollout.compute_returns_and_advantage(values.detach().cpu().numpy(), dones)
  File "/home/think__tech/Desktop/genrl/genrl/core/rollout_storage.py", line 233, in compute_returns_and_advantage
    last_return = self.rewards[step] + next_non_terminal * next_value
TypeError: mul(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

The reason I found was this

https://github.com/SforAiDl/genrl/blob/a1f0f730371582306373a45920b2fa4c53daae56/genrl/agents/deep/vpg/vpg.py#L114 Here we are passing the values as np array But here https://github.com/SforAiDl/genrl/blob/a1f0f730371582306373a45920b2fa4c53daae56/genrl/core/rollout_storage.py#L190 It expects a Torch tensor

Hence changing values.detach().cpu().numpy() to values does solve the problem

I don't, however, know where is torch being explicitly used in this (compute_returns_and_advantage) function

Devanshu24 commented 4 years ago

Also on a side note, I was just checking the CI file for VPG, it has this very function still the CI did pass on that merge Why is that the case?

UPD: Come to think of it, something could be wrong with my local machine? I did reinstall the library though

Sharad24 commented 4 years ago

After the refactoring that was done, I thought changing the tutorial accordingly to this

import gym  # OpenAI Gym

from genrl.agents import VPG
from genrl.trainers import OnPolicyTrainer
from genrl.environments import VectorEnv

env = VectorEnv("CartPole-v1")
agent = VPG('mlp', env)
trainer = OnPolicyTrainer(agent, env, epochs=200)
trainer.train()

This runs for me, gives the following expected output:

timestep         Epoch            loss             mean_reward      
0                0                7.5798           19.50480079650879          
10240            10               10.2152          25.315200805664062          
20480            20               12.8218          34.076499938964844          
30720            30               15.2794          44.61869812011719          
40960            40               18.8412          59.5349006652832          
51200            50               21.3708          74.2029037475586          
61440            60               24.8012          97.06159973144531          
71680            70               26.8724          114.41339874267578         
81920            80               30.5441          144.22540283203125         
92160            90               32.3745          166.5041046142578         
102400           100              34.4109          186.18179321289062 
112640           110              35.6278          206.8686981201172         
122880           120              36.009           222.60870361328125         
133120           130              35.3222          220.21499633789062          
143360           140              36.9469          249.756103515625         
153600           150              36.5153          240.94119262695312         
163840           160              38.5735          292.5714111328125         
174080           170              36.6223          265.9739990234375          
184320           180              39.6573          330.3226013183594         
194560           190              39.0522          325.07940673828125  
Sharad24 commented 4 years ago

UPD: Come to think of it, something could be wrong with my local machine? I did reinstall the library though

Are you installing from source?

Devanshu24 commented 4 years ago

UPD: Come to think of it, something could be wrong with my local machine? I did reinstall the library though

Are you installing from source?

Yes, to be precise this

sudo python setup.py install
Sharad24 commented 4 years ago
sudo python setup.py install

After this, do you run python or sudo python?

Also just to careful can you try python3/sudo python3?

Devanshu24 commented 4 years ago

Steps:

  1. python setup.py install does not work apparently insufficient permissions
  2. sudo python setup.py install installs the dependencies and things
  3. python temp.py or sudo python temp.py or python3 temp.py or sudo python3 temp.py
  4. I get the error

Just python temp.py used to work before OS: Ubuntu 18.04

UPD: I don't think it would, but is a reboot necessary?

Sharad24 commented 4 years ago

Can you try running this and check if the path of the library is where your source installation is?

import genrl
genrl.__file__
Devanshu24 commented 4 years ago

Can you try running this and check if the path of the library is where your source installation is?

import genrl
genrl.__file__

Yes it is where the git repo is (If that's what you mean by source installation, couldn't understand clearly)

Sharad24 commented 4 years ago

https://github.com/SforAiDl/genrl/blob/a1f0f730371582306373a45920b2fa4c53daae56/genrl/core/rollout_storage.py#L190

It expects a Torch tensor

Hence changing values.detach().cpu().numpy() to values does solve the problem

This is still a bug though, feel free to raise a PR for this.

Devanshu24 commented 4 years ago

I might be doing some very silly mistake too, so if you're certain that the code is fine we can close this issue :)

Sharad24 commented 4 years ago

It should be values.detach().clone() just to be careful though.

Once you make the change can you try training it?

Devanshu24 commented 4 years ago

https://github.com/SforAiDl/genrl/blob/a1f0f730371582306373a45920b2fa4c53daae56/genrl/core/rollout_storage.py#L190

It expects a Torch tensor Hence changing values.detach().cpu().numpy() to values does solve the problem

This is still a bug though, feel free to raise a PR for this.

Also I actually tested after making this change and it does work

Sharad24 commented 4 years ago

I might be doing some very silly mistake too, so if you're certain that the code is fine we can close this issue :)

Regardless of the fact that I'm able to reproduce or not, compute_returns function should not receive a numpy array after #294

So feel free to raise a PR.

Devanshu24 commented 4 years ago

It should be values.detach().clone() just to be careful though.

Once you make the change can you try training it?

Yup works after this too

Devanshu24 commented 4 years ago

Just out of curiosity any reason it passed the CI and worked for you but not for me?

Sharad24 commented 4 years ago

Not sure. You could try and debug that though :)

Devanshu24 commented 4 years ago

Wanted to give an update Earlier I was using torch 1.3.0 So I manually updated to the latest 1.6.0 and now it works (even the old code with i.e. values.detach().cpu().numpy()) I was operating under the assumption that doing python setup.py install makes sure the correct versions are present, is that not true or was it a glitch

Sharad24 commented 4 years ago

requirements.txt does specify 1.4.0

Devanshu24 commented 4 years ago

requirements.txt does specify 1.4.0

Exactly, so when I did python setup.py install it should have checked/updated that right? Or does that not happen?