Closed cesare-spinoso closed 2 years ago
So they use a very dumb way of calculating sample efficiency, average rewards observed not AUC
@sabina-elkins we might need to update the evaluation script to add this
oof dummies. I will add this to my todo list
wait, don't we already calculate this? it's just the mean reward during trainig
So they use a very dumb way of calculating sample efficiency, average rewards observed not AUC
@sabina-elkins we might need to update the evaluation script to add this