VinF / deer

DEEp Reinforcement learning framework
Other
485 stars 126 forks source link

More information about the LongerExplorationPolicy #63

Closed dynamik1703 closed 6 years ago

dynamik1703 commented 6 years ago

Hey VinF,

do you have more information about the LongerExplorationPolicy?

I'm wondering whether this policy is suitable for my environment. How should the length parameter be chosen?

Thanks!

Best wishes

VinF commented 6 years ago

Hi dynamik,

Basically the idea is that if you have a pure random exploration, you will end up with all possible ordered sequences that have uniform probabilities. E.g., if a set of two possible actions {1,2} and two time steps, the sequences {11,12,21,22} have all the same probability 0.25 of being tried out. For the LongerExplorationPolicy, the unordered sequences have uniform probabilities. So in the exemple, {11} has 0.33 probability, {22} has 0.33 and {12, 21} have together 0.33 (0.17 each). That can be useful in environment such as grid world where the order of the actions does not matter in most situations.

The length parameter should be chosen depending on your environment. Usually you'll have to try a few possibilities empirically and see what works.

Best, Vincent

dynamik1703 commented 6 years ago

Hi VinF,

thanks for your quick response!

How do you evaluate the Ornstein-Uhlenbeck-Process in comparison?

Best, Roman

VinF commented 6 years ago

Hi Roman, Indeed, you could possibly find parallels with the nomenclature in the domain of stochastic processes depending on the setting considered.