Closed swsychen closed 4 months ago
Yes, it's because of action repeat. The slight increase is just to make sure the agent not just reaches the budget but also write logs afterwards. For reporting results, of course we cut off any steps beyond the budget.
Hi, when I went through the paper, it said that
"We also use a single environment instance for Atari100K because the benchmark has a budget of 400K env steps..."
But in the configs.yaml, the default steps in atari100k is 1.1e5.
Does the difference come from the action repeat? And if this is the case, why it is not 1e5 to exactly match the 400K?
Thank you very much.