Closed mcmachado closed 9 years ago
Good point. That's actually a bug, since it prevents planning with the emulator unless repeat_action_probability = 0.0.
I think Mersenne Twister is enough. We would have to stick with C++11 but it would solve this problem (at least make it easier).
Two points: 1) I think Marlos wasn't fully clear at first, but what we were actually wishing for was direct access to to the ALE RNG serialized state to be saved to (and restored from) checkpoint files (in order to restore long jobs that get killed for whatever reason). The latest changes embed the RNG state in an ALEState, but still do not allow public access and mutation of the RNG state so it still can't be easily written to disk. For what it's worth, our particular checkpointing system only creates checkpoints between episodes, so we don't actually save the ALEState -- we'd rather the RNG state and ALEState be separate (though could work with it if they are coupled).
2) There's another good reason not to couple ALEState and the RNG state. Marc, you said that restoring the RNG state along with the ALEState is important for planning, but that doesn't seem right to me. If you are doing some kind of MC planning, you want to do a rollout to get a sample, restore the original state, and then do another rollout to get a different sample. If the RNG state is restored along with the ALEState, every rollout will use the same random sequence.
Simple recommendations:
Yes, you're right, my fix breaks planning in stochastic environments.
I would argue against returning the internal RNG for serialization unless there's you can think of a use case where you would want to serialize the RNG but not the emulator state. Here's what I propose: cloneSystemState / restoreSystemState methods which serialize both RNG and "Markov" state, with explicit documentation on its intended purpose.
Technically one could want a separate RNG for ease of implementing common random numbers-type algorithms, but I think the added complexity doesn't warrant it.
Well, for what it's worth our use case is exactly one where we want to serialize the RNG state but not the ALEState. Because we only checkpoint between episodes, we don't need to store the ALEState (when you restore, it should just be the initial state). It's not a big deal for us to serialize the whole thing (so I don't object to your proposal), just unnecessary.
Also, for what it's worth, I can imagine someone wanting to serialize the ALEState, but not the RNG state -- specifically if they want to create a mid-episode checkpoint, but aren't using stochastic actions. So I do think there's an argument for separate serialization methods.
I believe the PR #84 solved this issue, didn't it? Shouldn't we close this issue?
I guess a useful feature we can look at adding is to provide a way to save (and restore) the internal emulator state. We do not have even have to support it during an episode, but it would be nice if someone could save the TinyMT state to later restore his job. Right now, if one wants to checkpoint his code, he will have to restart TinyMT.