RobertTLange / gymnax

RL Environments in JAX 🌍
Apache License 2.0
585 stars 54 forks source link

`TrajectoryCollector` with discount masking if terminal #10

Closed RobertTLange closed 2 years ago

RobertTLange commented 3 years ago

Write a class that collects trajectories and returns a NamedTuple of collected data. This should include a buffer of state transition tuples (s_t, a_t, s_t_1, r_t, d_t). Problem: How to make general enough that different stats can also be stored (e.g. log_prob). Make agent return these in actor_step?