Possible misconfiguration in target_update_interval

The default target_update_interval for DQN based algorithms is set as 312 and not changed for any of the variants in the configs (except for R2D1, which seems to be correctly set to 2500). I don't think the default matches the DQN / Rainbow papers however.

This comment here says the default replay of 312 should correspond to 1e4 env steps. But using a replay ratio of 8 and batch size of 32 would mean that we do an optimizer update every 4 env steps. That corresponds to updating the target network every 4*312 = 1248 steps in the environment.

Is my understanding correct? And if so, shouldn't the default value of target_update_interval be 2500 gradient steps (2500*4=1e4 env steps) for DQN and 2000 for Rainbow (32K frames it says in the paper)?

astooke / rlpyt

Possible misconfiguration in target_update_interval #150