JuliaPOMDP / BasicPOMCP.jl

The PO-UCT algorithm (aka POMCP) implemented in Julia
Other
35 stars 17 forks source link

Average reward after learning a strategy #15

Open kubicon opened 4 years ago

kubicon commented 4 years ago

Hello I used BasicPOMCP to find optimal strategy in quite large game. I used example to calculate 10000 tree queries, but even tho i see the tree, I am mostly interested in average reward. I know there is function simulate, however i feel like results from this method vary more than i expect (but maybe taking n simulations and then do some kind of average is a good solution). Simply put is it possible to get average reward immediately after solver solves a game?

Thank you for response

zsunberg commented 4 years ago

For online solvers like POMCP, unfortunately I don't think there is a better way to evaluate than through Monte Carlo simulations (the parallel simulator is the best for that: https://juliapomdp.github.io/POMDPSimulators.jl/latest/parallel/#Parallel-1).

i feel like results from this method vary more than i expect

If the solver is not tuned properly, it can have very variable results (or the game may just be too big for POMCP to handle reliably). Did you tune the exploration parameter and value estimate/rollout policy to something reasonable for the game?

zsunberg commented 4 years ago

Also, I would strongly advocate for trying to solve the simplest and smallest possible version of your problem first before trying to move to a realistic size.

zsunberg commented 4 years ago

Also, how do I construct an OriginalGame with reasonable values? Do you have a default constructor written by now?

kubicon commented 4 years ago

I'll try to explain how I use BasicPOMCP. I have Pursuit Evasion Game for 2 players, where one can fully observe the game, and second only partially. Also I have algorithm, which should solve game for imperfect information player (player 1), against this perfect information player (player 2). So my goal is now to calculate Based on strategy for player 1, if strategy taken for player 2, was really the best, while using fixed strategy of player 1 (basically saying to check if current algorithm really works as intended). So in BasicPOMCP, I use information from original Game (hence the name originalGame) and strategy calculated from solving algorithm. At the beginning I wanted to make 3 structs, one for original game, second for result from algorithm and third to combine only necessary parameters for solver. But in the middle I decided to use just one struct, and didn't bother changing the name. I hope this is explanatory enough, and I will attach what I have right now (in workable state). My Julia skills are not really great, this is my first project in it and it is not even main area of my work atm.

Also I did not tune parameters of the solver. I'll try it, thank you for all of your advices, they helped a lot. PS: I added small Readme just to explain what every file does and commented how I expect the game to work. And as before it is still work in progress, so there are lot of things which doesn't work optimally. BasicPOMCPSolving.zip