A lot of simulators have implemented frame-skipping, due to the fact that one frame is often not enough time for the action's affect to be noticeable. It's a key parameter in getting RL algorithms to work:
RL with very frequent actions RL algorithms are very sensitive to the frequency of taking actions
which is why frame skip technique is usually used on Atari (Mnih et al., 2015). In continuous control
domains, the performance goes to zero as the frequency of taking actions goes to infinity, which is
caused by two factors: inconsistent exploration and the necessity to bootstrap more times to propagate
information about returns backward in time. How to design a sample-efficient RL algorithm which
can retain its performance even when the frequency of taking actions goes to infinity? The problem
of exploration can be addressed by using parameters noise for exploration (Plappert et al., 2017) and
faster information propagation could be achieved by employing multi-step returns. Other approach
could be an adaptive and learnable frame skip.
I think we should implement this, and maybe run a vanilla continuous control algorithm (maybe DDPG from what @nithin127 is working on?) and find a "best" one that we can ship as default with the simulator (and expose it, so people can test out what works best for their method).
A lot of simulators have implemented frame-skipping, due to the fact that one frame is often not enough time for the action's affect to be noticeable. It's a key parameter in getting RL algorithms to work:
I think we should implement this, and maybe run a vanilla continuous control algorithm (maybe DDPG from what @nithin127 is working on?) and find a "best" one that we can ship as default with the simulator (and expose it, so people can test out what works best for their method).