we're working to reproduce some of the results in the original paper. It is stated that the rainbow agent: "is feedforward and does not use any observation stacking outside of the last action, which is included in the current observation".
However, in the code the rainbow agent appears to stack the last 4 observations by default. Empirically (at least in early iterations) this doesn't seem to affect cumulative return much either way. Could someone clarify if obs stacking was used for the results in the paper?
we're working to reproduce some of the results in the original paper. It is stated that the rainbow agent: "is feedforward and does not use any observation stacking outside of the last action, which is included in the current observation".
However, in the code the rainbow agent appears to stack the last 4 observations by default. Empirically (at least in early iterations) this doesn't seem to affect cumulative return much either way. Could someone clarify if obs stacking was used for the results in the paper?