When I run your code, I find scores from the test bash is always a little higher than scores from the evaluation stage in the training bash (In train, the model is tested every 1w steps).
There are some results I got from the scripts. Left is from train bash and right is from test bash.
CrazyClimber 7246 9603
BankHeist 419 454
I have glanced two bash scripts and codes. In my understanding, two bash scripts evaluate agents in the completely same way where agents are evaluated with 32 seeds and get the mean of 32 scores.
So I have two questions,
Why the test bash is always a little higher than scores from the evaluation stage in the training bash ?
Which scripts you used to get the results in the paper?
Thanks for your great work!
When I run your code, I find scores from the test bash is always a little higher than scores from the evaluation stage in the training bash (In train, the model is tested every 1w steps).
There are some results I got from the scripts. Left is from train bash and right is from test bash.
CrazyClimber 7246 9603 BankHeist 419 454
I have glanced two bash scripts and codes. In my understanding, two bash scripts evaluate agents in the completely same way where agents are evaluated with 32 seeds and get the mean of 32 scores.
So I have two questions,
Why the test bash is always a little higher than scores from the evaluation stage in the training bash ?
Which scripts you used to get the results in the paper?
Looking forward for your reply.