Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.59k stars 284 forks source link

Data Effiecient Rainbow with Skiing does not work #86

Closed Somjit77 closed 1 year ago

Somjit77 commented 1 year ago

There is a problem with sampling from the replay buffer with data-efficient rainbow hyper-parameters with 'skiing'. It goes into an infinite loop.

Kaixhin commented 1 year ago

Checking the paper, it seems that the algorithm was only tested on 26 games, which doesn't include "skiing", so there's no guarantee in the original work that those hyperparameters are valid for the entire Atari suite.

Somjit77 commented 1 year ago

Hi, thanks for pointing this out. However, I still do not understand why it would affect prioritized experience replay sampling. If you run the algorithm with skiing for seed=123, it simply goes into an infinite loop while sampling from the buffer. It works perfectly for uniform sampling, so I am unable to figure out why this would be the case.

Kaixhin commented 1 year ago

Looks like a duplicate of https://github.com/Kaixhin/Rainbow/issues/41. PER can get stuck depending on the way its sampling works - as you can see from how uniform sampling works fine.

Somjit77 commented 1 year ago

Ah I see, I think the fact that we use 20-step TD compounds this problem as well. Thank you so much. It makes sense now.

k4ntz commented 1 month ago

Just as a heads-up, @Somjit77, Skiing is a difficult credit assignment problem, as you get some pseudo random reward at each steps, and the games counts the flags pairs you have been through. Solving such envs is not addressed by Rainbow or such algorithms. Check these papers for more infos: Wu et. al, Read and Reap the Rewards, NeurIPS 2023 Delfosse et. al - Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents, NeurIPS 2024