Robust termination Condition for equal horizon episodes

Bug description

Currently for some reason the termination condition that equalizes the horizon length for each rollouts is not work properly and thus generates variable horizon error.

Steps to reproduce

run any environment with 16 num_envs and generate 2000 or more rollouts. Work with environments that tend to reach early termination well before truncation.

Environment

I tried with couple environments and observed the same issue.

HumanCompatibleAI / imitation

Robust termination Condition for equal horizon episodes #856

Bug description

Steps to reproduce

Environment