Closed pchliu closed 4 years ago
Hmm, we already kind of had these baselines? They're probably not adapted to the latest rewards and such, but they might be useful to look at in terms of how to best use ray (they were written by Clement but ended up in my PR for some reason).
https://github.com/MKorablyov/LambdaZero/pull/107/files#diff-49d45def5f05354546bdbffe1e55823d
Clement's original branch: https://github.com/MKorablyov/LambdaZero/tree/boltzmann/LambdaZero/examples/baselines
both greedy and boltzmann are added