I'm not sure I understand the plots for Q. In the main.py code the val_mem is only ever updated with new states once. But in test.py the Q is evaluated at each new evaluation interval. So is this plot effectively graphing the Q assigned to the random states which are initially put into val_mem? Sorry if this is an obvious question, I'm brand new to deep RL research and this re-implementation of rainbow is really fascinating.
Yes, you understood correctly. The upside is that the Q-values are always evaluated against a fixed dataset, the downside is that it doesn't contain a representative set of states from the environment.
I'm not sure I understand the plots for Q. In the main.py code the val_mem is only ever updated with new states once. But in test.py the Q is evaluated at each new evaluation interval. So is this plot effectively graphing the Q assigned to the random states which are initially put into val_mem? Sorry if this is an obvious question, I'm brand new to deep RL research and this re-implementation of rainbow is really fascinating.