Changed the tf2rl implementation to return trajectories during evaluation and added avg step count

I wanted to implement the comparison of the expert trajectories and agent trajectories. While searching for a way on how to get the actual agent trajectories during evaluation I found that there is no way to get them out of tf2rl during evaluation.

So I forked tf2rl and implemented some small changes so that the evaluate_policy function now returns:

The average return (Was originally implemented)
The trajectories generated during evaluation
The average length of the trajectories (avg step count)

You can review the changes to tf2rl and see that it is only a minor change to the evaluate_policy function.
It now saves the generated trajectory from the replay buffer instead of clearing the replay buffer after every episode.

Finally the average step count is calculated and the value written to tensorboard.

What we now can do is:

Train the agent and get the agent trajectories from it during evaluation to compare them to the experts.
Change code in the tf2rl repository for our needs so that we can more easily evaluate

GAIL-4-BARK / bark-ml

Changed the tf2rl implementation to return trajectories during evaluation and added avg step count #56