Closed 0xsamgreen closed 5 years ago
That would be much appreciated! Unfortunately I'm still waiting for the latest results from Danijar (fixing the bug in the RNN would improve upon the original results), and in order to check that this code is fine we'd want to compare results so that means DeepMind Control Suite.
If you're interested in getting baseline results for your own work then perhaps it would be interesting to have some results on Pendulum-v0
and MountainCarContinuous-v0
(both symbolic and visual observations) anyway? We currently have no idea what appropriate hyperparameters are, and as one of those people who also don't have easy access to MuJoCo licenses I know it'd be nice to have some completely open source reference results.
Hi @Kaixhin,
I'm running headless, and I'm able to run symbolic mode fine, but I can't run in non-sybolic mode. I have render
set to False, but I get the following error:
Options
seed: 1
disable_cuda: False
env: Pendulum-v0
symbolic_env: False
max_episode_length: 1000
experience_size: 1000000
activation_function: relu
embedding_size: 1024
hidden_size: 200
belief_size: 200
state_size: 30
action_repeat: 2
action_noise: 0.3
episodes: 2000
seed_episodes: 5
collect_interval: 100
batch_size: 50
chunk_size: 50
overshooting_distance: 50
overshooting_kl_beta: 1
overshooting_reward_scale: 1
global_kl_beta: 0.1
free_nats: 2
learning_rate: 0.001
action_noise: 0.3
episodes: 2000
seed_episodes: 5
collect_interval: 100
batch_size: 50
chunk_size: 50
overshooting_distance: 50
overshooting_kl_beta: 1
overshooting_reward_scale: 1
global_kl_beta: 0.1
free_nats: 2
learning_rate: 0.001
grad_clip_norm: 1000
planning_horizon: 12
optimisation_iters: 10
candidates: 1000
top_candidates: 100
test_interval: 25
test_episodes: 10
checkpoint_interval: 25
checkpoint_experience: False
load_experience: False
load_checkpoint: 0
render: False
Traceback (most recent call last):
File "main.py", line 85, in <module>
observation, done, t = env.reset(), False, 0
File "/home/sgreen/working/planet.pt/env.py", line 87, in reset
return torch.tensor(cv2.resize(self._env.render(mode='rgb_array'), (64, 64), interpolation=cv2.INTER_LINEAR).transpose(2, 0, 1), dtype=torch.float32).div_(255).unsqueeze(dim=0)
File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/gym/core.py", line 249, in render
return self.env.render(mode, **kwargs)
File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/gym/envs/classic_control/pendulum.py", line 61, in render
from gym.envs.classic_control import rendering
File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/gym/envs/classic_control/rendering.py", line 27, in <module>
from pyglet.gl import *
File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/gl/__init__.py", line 239, in <module>
import pyglet.window
File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/__init__.py", line 1896, in <module>
gl._create_shadow_window()
File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/gl/__init__.py", line 208, in _create_shadow_window
_shadow_window = Window(width=1, height=1, visible=False)
File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/xlib/__init__.py", line 166, in __init__
super(XlibWindow, self).__init__(*args, **kwargs)
File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/__init__.py", line 501, in __init__
display = get_platform().get_default_display()
File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/__init__.py", line 1845, in get_default_display
return pyglet.canvas.get_display()
File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/canvas/__init__.py", line 82, in get_display
return Display()
File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/canvas/xlib.py", line 86, in __init__
raise NoSuchDisplayException('Cannot connect to "%s"' % name)
pyglet.canvas.xlib.NoSuchDisplayException: Cannot connect to "None"
Are you also running headless?
I am not running headless, as I'm using gym
's render
functionality to get image-based observations. I'm afraid you'll need to run with either a real or fake display.
I have a MuJoCo license now! Is there a sweep of MuJoCo environment tests you would like run?
Great news! I've started a run on walker-walk
, so any of the others.
Will do. Other than the environment, should I use the default parser arguments of your latest commit?
Latest commit should be same settings as PlaNet camera ready so yes - only other change is the recommended action repeat per environment, which you can see in env.py
.
Thanks, I started a run on cartpole-swingup, finger-spin, cheetah-run, and ball_in_cup-catch.
Hi @Kaixhin, things are looking good! I will continue to train for another day and then update, but it seems that scores are meeting or approaching Danijar's. Thanks again for making this port.
cartpole-swingup
cheetah-run
cup-catch
finger-spin
Awesome! If possible, do you mind finding a way to send me all the data once done (checkpoints and results)? Perhaps via a file sharing service, and I'll let you know once I've downloaded it all because it would take a lot of space. I'll also take the rewards and final model and make them available as a release. I've got results for walker-walk
, so just kicked off a run for reacher-easy
.
Here are the final test result plots! I'm looking into sharing the checkpoints and result logs.
cartpole-swingup
cheetah-run
cup-catch
finger-spin
Awesome! Do let me know if you do something else with PlaNet, but I think the results are good. I'll close this once you get me all the data.
@sg2 if you still have some capacity would you be able to run the same environments again with the latest commit? Among various improvements I've made changes to the image processing to match what was actually done in the original (I missed some of this originally). Should only take half the time now since I've set the default number of episodes to 1000 like in the camera ready.
Hi @Kaixhin, I'm on it!
Would you also be able to get results for walker-walk
and reacher-easy
? I've got just about enough space on Google Drive to get the results for all 6 tasks, so just email me and I'll share a folder with you that you can put everything into.
Here are my test results for commit ee9b996.
cartpole-swingup
cartpole-balance
cheetah-run
cup-catch
finger-spin
reacher-easy
walker-walk
Thanks a lot! Uploaded all figures for release v1.0 and v1.1. Unfortunately walker_walk
doesn't look that good either. Added notes on discrepancies to v1.0 - would be good if you can pass on the data from both sets of experiments; I'll upload final trained models for both.
No problem, thanks again for the port! Yes, I'll work on getting all the results from v1.0 and v1.1 to you. (I also trained all six agents on v1.0, before doing v1.1.)
Out of curiosity: are these results all comparable to the original PlaNet implementation? Can you explain why the cup-catch performance collapses during the middle of training and then recovers?
Apart from the high variance in cup-catch, which makes it hard to tell without more seeds if it's the same or a bit worse, results with tag 1.0 seem to be comparable. 1.1, which adds the 5-bit quantisation, noise and observation normalisation/centering, and is hence closer to the original, unfortunately seems to be worse on walker-walk and cup-catch. Have now noted this with the releases. Not sure about cup-catch collapse, but one thing in terms of the task is that it either gets the ball in and gets reward, or not, so the score can vary a lot based on success on this precise task.
@sg2 It's very glad to know that you have successfully run the entire code. while,when I run the non-sybolic mode,it always collapses just at 300 or 750 episodes( the whole episode is 1000 ). I don't know what's the reason.Would you have met this problem?Thank you very much.
@longfeizhang617 I'm sorry to hear that. What environment are you training? Also did you continue to let it run? You can see in my last result plots that it collapses for cup catch and then recovers. I never saw it collapse and then stay collapsed forever.
@sg2 Thanks for your attention.the default environment of training is Pendulum-v0. I suspect that planner part caused the memory overflow and thus the error,but i'm not sure.I have decreased the experience_size to be 100000(the oriainal is 1000000),and changed the batch-size/chunk-size/overshooting-distance from 50 to 30.While,the issue is still stayed.Is the experiment condition limiting the result ? My experiment condition includes a 16G memory , a GeForce GTX 1080Ti GPU.
@longfeizhang617 I added support for Gym environments for people to try PlaNet without needing MuJoCo. However the original paper only includes experiments for DeepMind Confirm Suite, so you would have to tune hyperparameters for any other environment. I'll make a note on the README.
@Kaixhin Thank you sincerely. I really haven't got the licenses of MuJoCo,so I just try Planet in Gym environment .You have inspired me that it maybe collapsed because of the mismatching of hyperparameters. I will try to tune hyperparameters.These days,I also communicate with others,at first,i suspect that there is somthing wrong in the iteration of the code,but @sg2 have accomplished it ,so i have to try more. It's really headless.
@Kaixhin @xsamgreen Do you have approximate training time stats for single/multi GPU setup on any of the symbolic environments ? It'd be really helpful to have training stats for a few envs(symbolic or otherwise) in the readme. Btw, thank you for the code and experiments, they're really helpful !!
Hi, thank you for making this port :)
In you conversation with Danijar, I read that you're limited in your abilities to test because of GPU availability? I'm interested in building from your code, and I'd be happy to help run tests for you. I have four Titan Xps I could dedicate to it for a bit. My limitation is that I don't have a Mujoco license (I'm working on it) so testing would be limited to Gym environments.