LARG / HFO

Half Field Offense in Robocup 2D Soccer
MIT License
231 stars 93 forks source link

Data re SARSA weights? #43

Open drallensmith opened 6 years ago

drallensmith commented 6 years ago

Is there any information available about the SARSA weights discussed in the main HFO paper? I'm not necessarily asking if the weights themselves are available, but more any statistical information that would help determine the degree to which HFO is a "fractured" learning domain. (I'm pretty certain it is to some degree, given that keepaway soccer is known to be one, but would like confirmation.)

mhauskn commented 6 years ago

The SARSA experiments were run by collaborators so unfortunately I don't have any the information about the weights of SARSA or even the original weights. However, the code is included in the examples so perhaps you could re-train your own.

Are you referring to fractured domain in the sense of different states requiring different actions?

drallensmith commented 6 years ago

Very close-together states requiring significantly different actions; the sort of thing that multi-modular-NEAT and RBF-NEAT are meant to help handle.

mhauskn commented 6 years ago

I think that it will depend on the HFO scenario. In a 1v0 with low level states & actions, I personally think similar states will have similar actions. However, with more players I think it's possible to have a much more fractured decision space.

drallensmith commented 6 years ago

I agree, although of course it depends on what one considers "similar" (only "kickable" gives a much different result between a ball just within kickable distance vs just outside, for instance). (I found an interesting student paper that found that with keepaway - HyperNEAT doing better with minimal numbers of players, but NEAT doing better than HyperNEAT with more players involved.)

One reason I'm interested in using high-level actions to start, however, despite that they almost certainly introduce more fractures in the space, is for a fair match vs random as a feasible-infeasible test.