Partial Observability in High Level Features

LARG / HFO

Half Field Offense in Robocup 2D Soccer

MIT License

228 stars 93 forks source link

Partial Observability in High Level Features #58

Closed DurgeshSamant closed 7 years ago

DurgeshSamant commented 7 years ago

In line with the convention adopted by the low level feature set, a teammate or opponent should - either or not - be completely observable [even when the fullstate flag is OFF].

Currently, quite often, an agent is able to observe the positional and other information of an opponent or teammate - however it is not able to observe its uniform number. Essentially it can sense [albeit noisily] the position of a player and yet be completely unaware of who that it. This should not be logically possible.

This PR fixes this issue in a way that is consistent with the approach adopted by the low level feature extractor.

mhauskn commented 7 years ago

Since this PR may change the behavior of the high-level feature set, I'd like to get some feedback from @drallensmith and others using high-level features. Is it useful to detect a player without being able to see their unum? Low-level feature set has already filtered out players whose unums are <0.

DurgeshSamant commented 7 years ago

Alright. Here is a full description of the behaviour as I understand it.

When the Fullstate flag is OFF

Agents can be completely observable. [i.e. All features including uniform numbers have valid values]
Agents can be partially observable [i.e. All features excluding the uniform number have valid values. Only uniform numbers get a value of -1]
Agents can be completely unobservable. [i.e. All features including the uniform number have a value of -2]

I propose that we keep 1 & 3 and do away with 2. I do so because in a game of soccer, it seems rather odd to be able to perceive position and shooting information without being able to perceive the identity of the player.

DurgeshSamant commented 7 years ago

Here is a performance comparison of the average 'cumulative' goal scoring percentage obtained by the default sarsa agent with and without the proposed changes. Each number is averaged over 25 runs of 5000 episodes each

Case | Without | With proposed changes 1v1 | 88.48 | 88.38 1v2 | 52.21 | 52.37 2v1 | 90.11 | 89.78 2v2 | 61.15 | 61.26 3v3 | 27.00 | 26.46

So it does not seem to me that the proposed changes significantly impact the performance of atleast the default SARSA offense agent.

Can others kindly replicate and confirm?