Adding an evaluation script

Because when they run our model on the server they will use 50 runs for 5 different seeds, I suggest that we re-evaluate the best trained models for each hyperparameters in this way. I also suggest that for each trained model we re-train only for 100_000 time steps for 5 different seeds and calculate the AUC so that we have a better idea of which model is the most sample efficient. I've already added some code but it needs more work here: https://github.com/cesare-spinoso/GROUP_013/blob/vpg_train_cesare/evaluate_agent.py

Copy over https://github.com/cesare-spinoso/GROUP_013/projects/1#card-80068217 and make sure randomness is ensured

cesare-spinoso / HopperMujoco

Adding an evaluation script #17