TimZaman / dotaclient

distributed RL spaghetti al arabiata
26 stars 7 forks source link

Why PPO? #37

Closed Nostrademous closed 5 years ago

Nostrademous commented 5 years ago

1) Honestly wanted to ask that for a while, but why did you select PPO as "the" algorithm you are implementing?

2) Did you consider anything else?

TimZaman commented 5 years ago

Easiest to implement and makes me able to iterate over the same data a few times with stability.

TimZaman commented 5 years ago

PPO over multiple epochs is great bc the GPU can handle a ton with ease.