Closed CRLqinliang closed 1 year ago
Hi, Our agents gather 40 transitions from 32 environments in parallel. During the first collection (1200 transitions), it is very likely that none of the 32 environment rechead the end of their episode. Hence we cannot compute the average success rate yet (we must wait for at least one episode to finish its episode to get the return).
Hi, guys, I got a question again. I observed that from Fig.13 to Fig.16 in the paper, it seems like the beginning points of those curves are not starting at frame zero. Do you miss something about that?