Ask for help regarding the value of reward

Frist of all, thank you for the great work you've done! The code of this reproduction is very clear. Here's my problem. I add a best_model_save_path parameter to the EvalCallback call in the script.py, so I can get the best model after the training steps. But after I try to evaluate the model using evaluate_policy from stable_baselines3.common.evaluation, I got really confused. The reward I got from this evaluation is negative, which is far away from the episode_reward in the logs, it's even worse than the fisrt eval result during the training. Why is it the case? I looked at the docs of stable-baselines3, the EvalCallback also uses evaluate_policy to get the reward values, so the results should be close.

In my test code, I just load the env like the script.py, and here's my evaluation process.

Agent = getattr(stable_baselines3, args.agent)
model = Agent.load("./testing/evaluation/model/best_model")
print(evaluate_policy(model, env))

Actually, I found this bcs I tried to tune the hyperparameters using Optuna, but the value given by Optuna is negative, while the episode reward is positive and pretty large. I really get confused by this result. Thanks again!

CN-UPB / NFVdeep

Ask for help regarding the value of reward #7