MishaLaskin / curl

CURL: Contrastive Unsupervised Representation Learning for Sample-Efficient Reinforcement Learning
MIT License
561 stars 88 forks source link

Environment step count with frame-skip #3

Closed rlbeaverton closed 4 years ago

rlbeaverton commented 4 years ago

Great work and thanks a lot for releasing the code! It’s awesome to see this simple contrastive loss term performing so well without the need for reconstruction.

Quick question regarding the environment step count: if we consider a DMC episode of standard length 1000 steps and we use a frameskip of 4, do the reported results consider the episode to have 1000 steps or 250 steps? Put differently, do the 100k step results mean 100k “low-level DMC” steps or 100k “agent-applying-an-action” steps?

MishaLaskin commented 4 years ago

reported results are low-level DMC environment steps (1k per episode)

rlbeaverton commented 4 years ago

Quick follow-up after reading "Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels" by Kostrikov et al., in which they state:

In contrast to prior work, CURL [42] plots returns as a function of modified environment steps, i.e. true environment steps divided by the action-repeat hyper-parameter.

Is their assertion then wrong? Thanks!

MishaLaskin commented 4 years ago

we count environment steps (100k env steps = 25k agent steps with action repeat of 4), please refer to section 5.1 https://arxiv.org/pdf/2004.04136.pdf