Hi,
It seems that for vectorized environments, the design for this library (and others) is to sample as following: if n = # environments, one sample is stored as a (n x obs_size) tuple into the replay buffer, and the model consumes the n-tuple. Why is it done this way, as opposed to storing the n-tuple as n separate tuples, and having the model consume one obs at a time? Thanks!
Hi, It seems that for vectorized environments, the design for this library (and others) is to sample as following: if n = # environments, one sample is stored as a (n x obs_size) tuple into the replay buffer, and the model consumes the n-tuple. Why is it done this way, as opposed to storing the n-tuple as n separate tuples, and having the model consume one obs at a time? Thanks!
Bryan