Kaixhin / PlaNet

Deep Planning Network: Control from pixels by latent planning with learned dynamics
MIT License
359 stars 59 forks source link

Need testing? #9

Closed 0xsamgreen closed 5 years ago

0xsamgreen commented 5 years ago

Hi, thank you for making this port :)

In you conversation with Danijar, I read that you're limited in your abilities to test because of GPU availability? I'm interested in building from your code, and I'd be happy to help run tests for you. I have four Titan Xps I could dedicate to it for a bit. My limitation is that I don't have a Mujoco license (I'm working on it) so testing would be limited to Gym environments.

Kaixhin commented 5 years ago

That would be much appreciated! Unfortunately I'm still waiting for the latest results from Danijar (fixing the bug in the RNN would improve upon the original results), and in order to check that this code is fine we'd want to compare results so that means DeepMind Control Suite.

If you're interested in getting baseline results for your own work then perhaps it would be interesting to have some results on Pendulum-v0 and MountainCarContinuous-v0 (both symbolic and visual observations) anyway? We currently have no idea what appropriate hyperparameters are, and as one of those people who also don't have easy access to MuJoCo licenses I know it'd be nice to have some completely open source reference results.

0xsamgreen commented 5 years ago

Hi @Kaixhin,

I'm running headless, and I'm able to run symbolic mode fine, but I can't run in non-sybolic mode. I have render set to False, but I get the following error:

                          Options
                          seed: 1
                          disable_cuda: False
                          env: Pendulum-v0
                          symbolic_env: False
                          max_episode_length: 1000
                          experience_size: 1000000
                          activation_function: relu
                          embedding_size: 1024
                          hidden_size: 200
                          belief_size: 200
                          state_size: 30
                          action_repeat: 2
                          action_noise: 0.3
                          episodes: 2000
                          seed_episodes: 5
                          collect_interval: 100
                          batch_size: 50
                          chunk_size: 50
                          overshooting_distance: 50
                          overshooting_kl_beta: 1
                          overshooting_reward_scale: 1
                          global_kl_beta: 0.1
                          free_nats: 2
                          learning_rate: 0.001
                          action_noise: 0.3
                          episodes: 2000
                          seed_episodes: 5
                          collect_interval: 100
                          batch_size: 50
                          chunk_size: 50
                          overshooting_distance: 50
                          overshooting_kl_beta: 1
                          overshooting_reward_scale: 1
                          global_kl_beta: 0.1
                          free_nats: 2
                          learning_rate: 0.001
                          grad_clip_norm: 1000
                          planning_horizon: 12
                          optimisation_iters: 10
                          candidates: 1000
                          top_candidates: 100
                          test_interval: 25
                          test_episodes: 10
                          checkpoint_interval: 25
                          checkpoint_experience: False
                          load_experience: False
                          load_checkpoint: 0
                          render: False
Traceback (most recent call last):
  File "main.py", line 85, in <module>
    observation, done, t = env.reset(), False, 0
  File "/home/sgreen/working/planet.pt/env.py", line 87, in reset
    return torch.tensor(cv2.resize(self._env.render(mode='rgb_array'), (64, 64), interpolation=cv2.INTER_LINEAR).transpose(2, 0, 1), dtype=torch.float32).div_(255).unsqueeze(dim=0)
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/gym/core.py", line 249, in render
    return self.env.render(mode, **kwargs)
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/gym/envs/classic_control/pendulum.py", line 61, in render
    from gym.envs.classic_control import rendering
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/gym/envs/classic_control/rendering.py", line 27, in <module>
    from pyglet.gl import *
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/gl/__init__.py", line 239, in <module>
    import pyglet.window
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/__init__.py", line 1896, in <module>
    gl._create_shadow_window()
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/gl/__init__.py", line 208, in _create_shadow_window
    _shadow_window = Window(width=1, height=1, visible=False)
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/xlib/__init__.py", line 166, in __init__
    super(XlibWindow, self).__init__(*args, **kwargs)
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/__init__.py", line 501, in __init__
    display = get_platform().get_default_display()
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/__init__.py", line 1845, in get_default_display
    return pyglet.canvas.get_display()
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/canvas/__init__.py", line 82, in get_display
    return Display()
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/canvas/xlib.py", line 86, in __init__
    raise NoSuchDisplayException('Cannot connect to "%s"' % name)
pyglet.canvas.xlib.NoSuchDisplayException: Cannot connect to "None"

Are you also running headless?

Kaixhin commented 5 years ago

I am not running headless, as I'm using gym's render functionality to get image-based observations. I'm afraid you'll need to run with either a real or fake display.

0xsamgreen commented 5 years ago

I have a MuJoCo license now! Is there a sweep of MuJoCo environment tests you would like run?

Kaixhin commented 5 years ago

Great news! I've started a run on walker-walk, so any of the others.

0xsamgreen commented 5 years ago

Will do. Other than the environment, should I use the default parser arguments of your latest commit?

Kaixhin commented 5 years ago

Latest commit should be same settings as PlaNet camera ready so yes - only other change is the recommended action repeat per environment, which you can see in env.py.

0xsamgreen commented 5 years ago

Thanks, I started a run on cartpole-swingup, finger-spin, cheetah-run, and ball_in_cup-catch.

0xsamgreen commented 5 years ago

Hi @Kaixhin, things are looking good! I will continue to train for another day and then update, but it seems that scores are meeting or approaching Danijar's. Thanks again for making this port.

cartpole-swingup test_results_cartpole-swingup

cheetah-run test_rewards-cheetah-run

cup-catch test_rewards_cup-catch

finger-spin test_rewards_finger-spin

Kaixhin commented 5 years ago

Awesome! If possible, do you mind finding a way to send me all the data once done (checkpoints and results)? Perhaps via a file sharing service, and I'll let you know once I've downloaded it all because it would take a lot of space. I'll also take the rewards and final model and make them available as a release. I've got results for walker-walk, so just kicked off a run for reacher-easy.

0xsamgreen commented 5 years ago

Here are the final test result plots! I'm looking into sharing the checkpoints and result logs.

cartpole-swingup newplot (3)

cheetah-run newplot (4)

cup-catch newplot (5)

finger-spin newplot (6)

Kaixhin commented 5 years ago

Awesome! Do let me know if you do something else with PlaNet, but I think the results are good. I'll close this once you get me all the data.

Kaixhin commented 5 years ago

@sg2 if you still have some capacity would you be able to run the same environments again with the latest commit? Among various improvements I've made changes to the image processing to match what was actually done in the original (I missed some of this originally). Should only take half the time now since I've set the default number of episodes to 1000 like in the camera ready.

0xsamgreen commented 5 years ago

Hi @Kaixhin, I'm on it!

Kaixhin commented 5 years ago

Would you also be able to get results for walker-walk and reacher-easy? I've got just about enough space on Google Drive to get the results for all 6 tasks, so just email me and I'll share a folder with you that you can put everything into.

0xsamgreen commented 5 years ago

Here are my test results for commit ee9b996.

cartpole-swingup image

cartpole-balance image

cheetah-run image

cup-catch image

finger-spin image

reacher-easy image

walker-walk image

Kaixhin commented 5 years ago

Thanks a lot! Uploaded all figures for release v1.0 and v1.1. Unfortunately walker_walk doesn't look that good either. Added notes on discrepancies to v1.0 - would be good if you can pass on the data from both sets of experiments; I'll upload final trained models for both.

0xsamgreen commented 5 years ago

No problem, thanks again for the port! Yes, I'll work on getting all the results from v1.0 and v1.1 to you. (I also trained all six agents on v1.0, before doing v1.1.)

maximecb commented 5 years ago

Out of curiosity: are these results all comparable to the original PlaNet implementation? Can you explain why the cup-catch performance collapses during the middle of training and then recovers?

Kaixhin commented 5 years ago

Apart from the high variance in cup-catch, which makes it hard to tell without more seeds if it's the same or a bit worse, results with tag 1.0 seem to be comparable. 1.1, which adds the 5-bit quantisation, noise and observation normalisation/centering, and is hence closer to the original, unfortunately seems to be worse on walker-walk and cup-catch. Have now noted this with the releases. Not sure about cup-catch collapse, but one thing in terms of the task is that it either gets the ball in and gets reward, or not, so the score can vary a lot based on success on this precise task.

longfeizhang617 commented 5 years ago

@sg2 It's very glad to know that you have successfully run the entire code. while,when I run the non-sybolic mode,it always collapses just at 300 or 750 episodes( the whole episode is 1000 ). I don't know what's the reason.Would you have met this problem?Thank you very much.

0xsamgreen commented 5 years ago

@longfeizhang617 I'm sorry to hear that. What environment are you training? Also did you continue to let it run? You can see in my last result plots that it collapses for cup catch and then recovers. I never saw it collapse and then stay collapsed forever.

longfeizhang617 commented 5 years ago

@sg2 Thanks for your attention.the default environment of training is Pendulum-v0. I suspect that planner part caused the memory overflow and thus the error,but i'm not sure.I have decreased the experience_size to be 100000(the oriainal is 1000000),and changed the batch-size/chunk-size/overshooting-distance from 50 to 30.While,the issue is still stayed.Is the experiment condition limiting the result ? My experiment condition includes a 16G memory , a GeForce GTX 1080Ti GPU.

Kaixhin commented 5 years ago

@longfeizhang617 I added support for Gym environments for people to try PlaNet without needing MuJoCo. However the original paper only includes experiments for DeepMind Confirm Suite, so you would have to tune hyperparameters for any other environment. I'll make a note on the README.

longfeizhang617 commented 5 years ago

@Kaixhin Thank you sincerely. I really haven't got the licenses of MuJoCo,so I just try Planet in Gym environment .You have inspired me that it maybe collapsed because of the mismatching of hyperparameters. I will try to tune hyperparameters.These days,I also communicate with others,at first,i suspect that there is somthing wrong in the iteration of the code,but @sg2 have accomplished it ,so i have to try more. It's really headless.

vballoli commented 4 years ago

@Kaixhin @xsamgreen Do you have approximate training time stats for single/multi GPU setup on any of the symbolic environments ? It'd be really helpful to have training stats for a few envs(symbolic or otherwise) in the readme. Btw, thank you for the code and experiments, they're really helpful !!