dgriff777 / rl_a3c_pytorch

A3C LSTM Atari with Pytorch plus A3G design
Apache License 2.0
563 stars 119 forks source link

Solving time #4

Closed hugemicrobe closed 7 years ago

hugemicrobe commented 7 years ago

Thank you for the nice implementation. I'm curious about the running time on your machine. In https://github.com/ikostrikov/pytorch-a3c, it is reported that PongDeterministic-v3 is solved around 15min, did you reproduce similar results in any version of Pong?

Thank you

dgriff777 commented 7 years ago

Well gonna be slower due to that model has inputs of frame 42x42 and mine is 80x80 but just ran on 16 threads and solved around 45mins. But figured it would also be slower as https://github.com/ikostrikov/pytorch-a3c is highly optimized model for PongDeterministic but overall performance greater with my implementation for Pong-v0

2017-06-13 01:42:43,618 : load: False
2017-06-13 01:42:43,619 : tau: 1.0
2017-06-13 01:42:43,619 : save_score_level: 20
2017-06-13 01:42:43,619 : optimizer: Adam
2017-06-13 01:42:43,619 : no_shared: False
2017-06-13 01:42:43,619 : max_episode_length: 10000
2017-06-13 01:42:43,619 : count_lives: False
2017-06-13 01:42:43,620 : num_processes: 16
2017-06-13 01:42:43,620 : num_steps: 20
2017-06-13 01:42:43,620 : env_config: conf.json
2017-06-13 01:42:43,620 : save_model_dir: trained_models/
2017-06-13 01:42:43,620 : seed: 5
2017-06-13 01:42:43,621 : lr: 0.0001
2017-06-13 01:42:43,621 : log_dir: logs/
2017-06-13 01:42:43,621 : env_name: PongDeterministic-v3
2017-06-13 01:42:43,621 : load_model_dir: trained_models/
2017-06-13 01:42:43,621 : gamma: 0.95
2017-06-13 01:42:57,205 : Time 00h 00m 13s, episode reward -21.0, episode length 764, reward mean -21.0000
2017-06-13 01:44:09,937 : Time 00h 01m 26s, episode reward -21.0, episode length 764, reward mean -21.0000
2017-06-13 01:45:22,630 : Time 00h 02m 38s, episode reward -21.0, episode length 764, reward mean -21.0000
2017-06-13 01:46:35,193 : Time 00h 03m 51s, episode reward -21.0, episode length 764, reward mean -21.0000
2017-06-13 01:47:47,897 : Time 00h 05m 04s, episode reward -21.0, episode length 764, reward mean -21.0000
2017-06-13 01:49:20,615 : Time 00h 06m 36s, episode reward -21.0, episode length 2004, reward mean -21.0000
2017-06-13 01:50:53,241 : Time 00h 08m 09s, episode reward -21.0, episode length 2004, reward mean -21.0000
2017-06-13 01:52:06,839 : Time 00h 09m 23s, episode reward -21.0, episode length 824, reward mean -21.0000
2017-06-13 01:53:33,569 : Time 00h 10m 49s, episode reward -21.0, episode length 1644, reward mean -21.0000
2017-06-13 01:55:00,189 : Time 00h 12m 16s, episode reward -21.0, episode length 1644, reward mean -21.0000
2017-06-13 01:56:53,771 : Time 00h 14m 10s, episode reward -21.0, episode length 3303, reward mean -21.0000
2017-06-13 01:58:36,746 : Time 00h 15m 53s, episode reward -21.0, episode length 2564, reward mean -21.0000
2017-06-13 02:00:15,742 : Time 00h 17m 32s, episode reward -21.0, episode length 2325, reward mean -21.0000
2017-06-13 02:01:51,951 : Time 00h 19m 08s, episode reward -19.0, episode length 2165, reward mean -20.8571
2017-06-13 02:03:27,816 : Time 00h 20m 44s, episode reward -21.0, episode length 2146, reward mean -20.8667
2017-06-13 02:05:28,641 : Time 00h 22m 44s, episode reward -18.0, episode length 3644, reward mean -20.6875
2017-06-13 02:07:40,858 : Time 00h 24m 57s, episode reward -21.0, episode length 4364, reward mean -20.7059
2017-06-13 02:10:15,474 : Time 00h 27m 31s, episode reward -21.0, episode length 5807, reward mean -20.7222
2017-06-13 02:12:08,481 : Time 00h 29m 24s, episode reward 20.0, episode length 3235, reward mean -18.5789
2017-06-13 02:15:52,043 : Time 00h 33m 08s, episode reward -7.0, episode length 10000, reward mean -18.0000
2017-06-13 02:17:31,052 : Time 00h 34m 47s, episode reward 20.0, episode length 2366, reward mean -16.1905
2017-06-13 02:19:33,792 : Time 00h 36m 50s, episode reward -21.0, episode length 3812, reward mean -16.4091
2017-06-13 02:21:07,639 : Time 00h 38m 23s, episode reward 21.0, episode length 2039, reward mean -14.7826
2017-06-13 02:22:51,138 : Time 00h 40m 07s, episode reward -21.0, episode length 2674, reward mean -15.0417
2017-06-13 02:24:42,875 : Time 00h 41m 59s, episode reward 19.0, episode length 3170, reward mean -13.6800
2017-06-13 02:26:44,391 : Time 00h 44m 00s, episode reward 12.0, episode length 3776, reward mean -12.6923
2017-06-13 02:28:16,747 : Time 00h 45m 33s, episode reward 21.0, episode length 1977, reward mean -11.4444
2017-06-13 02:29:52,019 : Time 00h 47m 08s, episode reward 20.0, episode length 2150, reward mean -10.3214
2017-06-13 02:31:24,275 : Time 00h 48m 40s, episode reward 21.0, episode length 1977, reward mean -9.2414
2017-06-13 02:32:56,790 : Time 00h 50m 13s, episode reward 21.0, episode length 1977, reward mean -8.2333
2017-06-13 02:34:30,256 : Time 00h 51m 46s, episode reward 21.0, episode length 1995, reward mean -7.2903
hugemicrobe commented 7 years ago

Thank you for the reply. I ran your code on my machine, the logs are as follows. The environment I used is PongDeterministic-v4. (It seems that v3 is not available on my machine, and I tried searching for it but haven't found ways to switch to v3.)

I used Pytorch 0.1.12 (py27_2cu75), and there are 8 cores on my computer. The spec: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz. However, it took about 2hrs to solve the game. Do you use similar settings?

2017-06-15 13:30:05,209 : load: False 2017-06-15 13:30:05,209 : tau: 1.0 2017-06-15 13:30:05,209 : save_score_level: 20 2017-06-15 13:30:05,209 : optimizer: Adam 2017-06-15 13:30:05,209 : max_episode_length: 10000 2017-06-15 13:30:05,209 : count_lives: False 2017-06-15 13:30:05,209 : num_processes: 16 2017-06-15 13:30:05,210 : num_steps: 20 2017-06-15 13:30:05,210 : env_config: config.json 2017-06-15 13:30:05,210 : save_model_dir: trained_models/ 2017-06-15 13:30:05,210 : seed: 1 2017-06-15 13:30:05,210 : lr: 0.0001 2017-06-15 13:30:05,210 : log_dir: logs/ 2017-06-15 13:30:05,210 : env_name: PongDeterministic-v4 2017-06-15 13:30:05,210 : load_model_dir: trained_models/ 2017-06-15 13:30:05,210 : shared_optimizer: True 2017-06-15 13:30:05,210 : gamma: 0.95 2017-06-15 13:30:55,703 : Time 00h 00m 50s, episode reward -21.0, episode length 764, reward mean -21.0000 2017-06-15 13:32:51,184 : Time 00h 02m 45s, episode reward -21.0, episode length 764, reward mean -21.0000 2017-06-15 13:34:48,254 : Time 00h 04m 42s, episode reward -21.0, episode length 764, reward mean -21.0000 2017-06-15 13:36:45,617 : Time 00h 06m 40s, episode reward -21.0, episode length 764, reward mean -21.0000 2017-06-15 13:38:41,247 : Time 00h 08m 35s, episode reward -21.0, episode length 764, reward mean -21.0000

(------skipped------)

2017-06-15 15:04:36,247 : Time 01h 34m 30s, episode reward -21.0, episode length 3764, reward mean -21.0000 2017-06-15 15:08:42,671 : Time 01h 38m 37s, episode reward -21.0, episode length 2564, reward mean -21.0000 2017-06-15 15:13:58,441 : Time 01h 43m 53s, episode reward -18.0, episode length 3453, reward mean -20.9118 2017-06-15 15:18:22,764 : Time 01h 48m 17s, episode reward 19.0, episode length 2733, reward mean -19.7714 2017-06-15 15:23:53,017 : Time 01h 53m 47s, episode reward -18.0, episode length 3513, reward mean -19.7222 2017-06-15 15:27:28,772 : Time 01h 57m 23s, episode reward 20.0, episode length 2103, reward mean -18.6486 2017-06-15 15:32:43,694 : Time 02h 02m 38s, episode reward 8.0, episode length 3472, reward mean -17.9474 2017-06-15 15:36:19,923 : Time 02h 06m 14s, episode reward 20.0, episode length 2103, reward mean -16.9744

dgriff777 commented 7 years ago

Oh you probably couldn't run my trained models either I think as discrete actions changed too and affected all version with upgrade. To run them u need to keep gym <= 0.8.2 and atari-py <= 0.0.21

As for training you should keep number of threads no larger than number of cores you have otherwise it can actually be detrimental to training. That quick run I did above had 64 cores so well over 16 threads I had running

dgriff777 commented 7 years ago

Yeah from log I can see because of too many threads model was running nearly 4 times slower in calculations

hugemicrobe commented 7 years ago

Thank you, this really helps a lot.

dgriff777 commented 7 years ago

No problem. Gonna add a note to these issues on Readme as important to be considered