I see that in a2c_common.py for the play_steps_rnn file, after collecting 16 eval observations and feeding them into the experience buffer, why do we need to apply swap_and_flattern01 to get_transformed_list function to modify the experience buffer? I am confused about this operation here.
I see that in a2c_common.py for the play_steps_rnn file, after collecting 16 eval observations and feeding them into the experience buffer, why do we need to apply swap_and_flattern01 to get_transformed_list function to modify the experience buffer? I am confused about this operation here.