Frame Skipping - Githubissues

A lot of simulators have implemented frame-skipping, due to the fact that one frame is often not enough time for the action's affect to be noticeable. It's a key parameter in getting RL algorithms to work:

RL with very frequent actions RL algorithms are very sensitive to the frequency of taking actions which is why frame skip technique is usually used on Atari (Mnih et al., 2015). In continuous control domains, the performance goes to zero as the frequency of taking actions goes to infinity, which is caused by two factors: inconsistent exploration and the necessity to bootstrap more times to propagate information about returns backward in time. How to design a sample-efficient RL algorithm which can retain its performance even when the frequency of taking actions goes to infinity? The problem of exploration can be addressed by using parameters noise for exploration (Plappert et al., 2017) and faster information propagation could be achieved by employing multi-step returns. Other approach could be an adaptive and learnable frame skip.

I think we should implement this, and maybe run a vanilla continuous control algorithm (maybe DDPG from what @nithin127 is working on?) and find a "best" one that we can ship as default with the simulator (and expose it, so people can test out what works best for their method).

duckietown / gym-duckietown

Frame Skipping #56