astooke / rlpyt

Reinforcement Learning in PyTorch
MIT License
2.22k stars 323 forks source link

Working example code for R2D1 #87

Open DavidMChan opened 4 years ago

DavidMChan commented 4 years ago

It seems like the code in experiments/ is outdated - I've managed to update the import errors and get most of the code running, however I get the following issue when running the atari_r2d1_gpu.py script:

Traceback (most recent call last):
  File "rlpyt/experiments/scripts/atari/dqn/train/atari_r2d1_gpu.py", line 50, in <module>
    build_and_train(*sys.argv[1:])
  File "rlpyt/experiments/scripts/atari/dqn/train/atari_r2d1_gpu.py", line 46, in build_and_train
    runner.train()
  File "/home/david/Repos/rlpyt/rlpyt/runners/minibatch_rl.py", line 240, in train
    opt_info = self.algo.optimize_agent(itr, samples)
  File "/home/david/Repos/rlpyt/rlpyt/algos/dqn/r2d1.py", line 133, in optimize_agent
    self.replay_buffer.append_samples(samples_to_buffer)
  File "/home/david/Repos/rlpyt/rlpyt/replays/sequence/prioritized.py", line 52, in append_samples
    T, idxs = super().append_samples(samples)
  File "/home/david/Repos/rlpyt/rlpyt/replays/frame.py", line 46, in append_samples
    buffer_samples = BufferSamples(*(v for k, v in samples.items()
TypeError: __new__() missing 2 required positional arguments: 'done' and 'prev_rnn_state'

Any tips on making the required modifications? If not, is there any place in the codebase which has working R2D1 example code?

astooke commented 4 years ago

Yes, sorry about that, I should clean that up. Meantime, this one should be fresh, and with all the bells and whistles:

rlpyt/experiments/scripts/atari/dqn/launch/pabti/launch_atari_r2d1_async_alt_gravitar.py

manesourabh commented 4 years ago

rlpyt/experiments/scripts/atari/dqn/launch/pabti/launch_atari_r2d1_async_alt_gravitar.py

Hi @astooke, The link above is not working. can you provide a fresh example?

astooke commented 4 years ago

Sorry! Was just a bad link, here it is:

https://github.com/astooke/rlpyt/blob/master/rlpyt/experiments/scripts/atari/dqn/launch/pabti/launch_atari_r2d1_async_alt_gravitar.py

crizCraig commented 4 years ago

Thanks for the great repo @astooke! Super excited to use it!

I was trying to train R2D1/pong on a single machine, and was able to this working, but I only have 1 gpu despite needing to set n_gpu=2 in the affinity there. I was also unable to get the serial sampler working with R2D1 (dqn, example_1.py works tho :) ). Do you happen to have any recommended settings or perhaps a fix for the 'done' and 'prev_rnn_state' error in launch_atari_r2d1_gpu_basic.py. Thanks again for providing this great resource Adam!

astooke commented 4 years ago

@crizCraig Thanks for the kind words!

Fixed the gpu_basic case in commit: 229f4bf1a1b9eb274dfa19d858dd3c0443939c05 (had bad, leftover code for putting samples into the replay which was used in non-async modes, such as serial; just changed it to the newer method)

astooke commented 4 years ago

Not sure about the problem with the async_alt code, though. What errors are you hitting? Or could you print out what the affinities are that are being made? The affinity generation for the async case might not be perfectly general.

crizCraig commented 4 years ago

Thanks @astooke! Actually n_gpu=1 works with the r2d1_test config, so I think I was wrong about n_gpu=2 being the problem. There was no error, but training hangs with the async_alt_pabti config after

/home/c2/anaconda3/envs/rlpyt/bin/python /home/c2/src/rlpyt/rlpyt/experiments/scripts/atari/dqn/launch/pabti/launch_atari_r2d1_async_alt_gravitar.py

call string:
 taskset -c 0,1,2,3 python rlpyt/experiments/scripts/atari/dqn/train/atari_r2d1_async_alt.py 0slt_4cpu_1gpu_0hto_1ass_1oss_1alt /home/c2/src/rlpyt/data/local/2020_03-03_11-01.36/atari_r2d1_async_alt/pong 0 async_alt_pabti
my config:
{
  "agent": {
    "eps_final": 0.1,
    "eps_final_min": 0.0005
  },
  "model": {
    "dueling": true
  },
  "algo": {
    "discount": 0.997,
    "batch_T": 80,
    "batch_B": 64,
    "warmup_T": 40,
    "store_rnn_state_interval": 40,
    "replay_ratio": 1,
    "learning_rate": 0.0001,
    "clip_grad_norm": 80.0,
    "min_steps_learn": 100000,
    "double_dqn": true,
    "prioritized_replay": true,
    "n_step_return": 5,
    "pri_alpha": 0.9,
    "pri_beta_init": 0.6,
    "pri_beta_final": 0.6,
    "input_priority_shift": 2,
    "replay_size": 4000000
  },
  "optim": {},
  "env": {
    "game": "pong",
    "episodic_lives": false,
    "clip_reward": false,
    "horizon": 27000,
    "num_img_obs": 4
  },
  "eval_env": {
    "game": "pong",
    "episodic_lives": false,
    "horizon": 27000,
    "clip_reward": false,
    "num_img_obs": 4
  },
  "runner": {
    "n_steps": 20000000.0,
    "log_interval_steps": 1000.0
  },
  "sampler": {
    "batch_T": 40,
    "batch_B": 264,
    "max_decorrelation_steps": 1000,
    "eval_n_envs": 44,
    "eval_max_steps": 1232000,
    "eval_max_trajectories": 120
  }
}
using seed 7446
2020-03-03 11:01:37.324067  | async_alt_pong_0 Running 20000 sampler iterations.
2020-03-03 11:01:37.471631  | async_alt_pong_0 Frame-based buffer using 4-frame sequences.

Looks like the affinity code is 0slt_4cpu_1gpu_0hto_1ass_1oss_1alt. Perhaps I just need to wait longer? I see low compute usage/growth - but haven't delved much more into it. Thanks again for the fix!

tesslerc commented 4 years ago

@crizCraig , were you able to fix this? Tried a quick run here and it also hanged. Simple non async stuff (DQN) did work well. ---edit--- Seems I had some leftover python processes in the background after killing the program several times -- killed them all, ran again and waited... it works :)