MishaLaskin / curl

CURL: Contrastive Unsupervised Representation Learning for Sample-Efficient Reinforcement Learning
MIT License
561 stars 88 forks source link

Cannot train for Google football environment. #16

Closed Atharvavp closed 3 years ago

Atharvavp commented 3 years ago

I've been trying to implement CURL for a different environment than that of DeepMind Suite which is Google football environment. But I've been getting errors regarding action_shape,obs_shape and channels.

1) Issue with channels:

RuntimeError: Given groups=1, weight of size 32 6 3 3, expected input[1, 144, 84, 3] to have 6 channels, but got 144 channels instead.

2) Issue while assigning value of action shape from that of the environments:

_Traceback (most recent call last): File "train.py", line 291, in main() File "train.py", line 226, in main device=device File "train.py", line 148, in make_agent curl_latent_dim=args.curl_latent_dim File "/home/atharva/CURL/curl/curl_sac.py", line 285, in init num_layers, num_filters File "/home/atharva/CURL/curl/curl_sac.py", line 73, in init nn.Linear(hidden_dim, 2 * actionshape[0]) IndexError: tuple index out of range

3) Issue with PixelEncoder :

_Traceback (most recent call last): File "train.py", line 292, in main() File "train.py", line 240, in main evaluate(env, agent, video, args.num_eval_episodes, L, step,args) File "train.py", line 116, in evaluate run_eval_loop(sample_stochastically=False) File "train.py", line 101, in run_eval_loop action = agent.select_action(obs) File "/home/atharva/CURL/curl/curl_sac.py", line 355, in select_action obs, compute_pi=False, compute_log_pi=False File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, kwargs) File "/home/atharva/CURL/curl/curl_sac.py", line 82, in forward obs = self.encoder(obs, detach=detach_encoder) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, *kwargs) File "/home/atharva/CURL/curl/encoder.py", line 67, in forward h_fc = self.fc(h) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, kwargs) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/functional.py", line 1370, in linear ret = torch.addmm(bias, input, weight.t())_ RuntimeError: size mismatch, m1: [1 x 672], m2: [39200 x 50] at /tmp/pip-req-build-ocx5vxk7/aten/src/THC/generic/THCTensorMathBlas.cu:290

4) Issue with Padded input :

_Traceback (most recent call last): File "train.py", line 292, in main() File "train.py", line 240, in main evaluate(env, agent, video, args.num_eval_episodes, L, step,args) File "train.py", line 116, in evaluate run_eval_loop(sample_stochastically=False) File "train.py", line 101, in run_eval_loop action = agent.select_action(obs) File "/home/atharva/CURL/curl/curl_sac.py", line 355, in select_action obs, compute_pi=False, compute_log_pi=False File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, kwargs) File "/home/atharva/CURL/curl/curl_sac.py", line 82, in forward obs = self.encoder(obs, detach=detach_encoder) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, *kwargs) File "/home/atharva/CURL/curl/encoder.py", line 62, in forward h = self.forward_conv(obs) File "/home/atharva/CURL/curl/encoder.py", line 55, in forward_conv conv = torch.relu(self.convsi) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, kwargs) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 345, in forward return self.conv2d_forward(input, self.weight) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in conv2dforward self.padding, self.dilation, self.groups) RuntimeError: Calculated padded input size per channel: (21 x 1). Kernel size: (3 x 3). Kernel size can't be greater than actual input size.

5)Issue with Action in action set :

_Traceback (most recent call last): File "train.py", line 292, in main() File "train.py", line 240, in main evaluate(env, agent, video, args.num_eval_episodes, L, step,args) File "train.py", line 116, in evaluate run_eval_loop(sample_stochastically=False) File "train.py", line 102, in run_evalloop obs, reward, done, = env.step(action) File "/home/atharva/CURL/curl/utils.py", line 226, in step obs, reward, done, info = self.env.step(action) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gym/core.py", line 234, in step return self.env.step(action) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gym/core.py", line 280, in step observation, reward, done, info = self.env.step(action) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gym/core.py", line 268, in step observation, reward, done, info = self.env.step(action) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gym/core.py", line 268, in step observation, reward, done, info = self.env.step(action) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gfootball/env/footballenv.py", line 177, in step , reward, done, info = self._env.step(self._get_actions()) File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gfootball/env/football_env_core.py", line 160, in step for a in action File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gfootball/env/football_env_core.py", line 160, in for a in action File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gfootball/env/football_action_set.py", line 217, in named_action_from_actionset assert False, "Action {} not found in action set".format(action) AssertionError: Action -0.049828674644231796 not found in action set

It just seems trying to solve one gives a rise to another one. Can you please let me know how could these issues be resolved ?

Thank you.

MishaLaskin commented 3 years ago

Hi @Atharvavp, this codebase was not tested against the Google football environment. It runs for (84,84,3) image single agent environments with continuous actions.