Write a class that collects trajectories and returns a NamedTuple of collected data. This should include a buffer of state transition tuples (s_t, a_t, s_t_1, r_t, d_t). Problem: How to make general enough that different stats can also be stored (e.g. log_prob). Make agent return these in actor_step?
Write a class that collects trajectories and returns a
NamedTuple
of collected data. This should include a buffer of state transition tuples (s_t, a_t, s_t_1, r_t, d_t
). Problem: How to make general enough that different stats can also be stored (e.g. log_prob). Make agent return these inactor_step
?