Fix model loading and evaluation

Fixed an issue where the model loading was not done correctly because the training and the testing environment weren't wrapped and normalized in the same way, causing the evaluation of the loaded model to differ from the resulting training.

Changes:

Unified the wrapping and normalizing of training and testing environments.
Added evaluation callback during training to check the model performance with a specified frequency during training.
The test environment is only used during evaluation callback, and a new eval environment is used for evaluation after the end of the training or when loading a previously trained model.
Added the ability to save a video of the evaluation

SofaDefrost / SofaGym

Fix model loading and evaluation #45