Thanks for releasing the code.
Could you provide some additional information on the exact setting you are using with respect to the atari environment? From the code, it seems that you are using the NoFrameSkip-v4 version of the gym env, which, as far as I can tell, implies:
You are taking an action every frame, whereas standard evaluation protocol uses a frameskip of 4, meaning taking an action only every fourth frame
Your environment is fully deterministic, in particular there is no sticky action (repeat_action_probability=0). As far as I can tell, some of the methods that you are comparing to, such as SGI, do use sticky actions.
Hi all,
Thanks for releasing the code. Could you provide some additional information on the exact setting you are using with respect to the atari environment? From the code, it seems that you are using the NoFrameSkip-v4 version of the gym env, which, as far as I can tell, implies:
Could you please clarify?
Thanks in advance.