Performance Degradation observed in PR #51 with partial observability

PR #51 demonstrates how one can write thin python wrappers over the existing sarsa libraries. Such wrappers can allow one to code up high level agents in python that can directly call the fast C++ sarsa library functions.

As an example, PR #51 also contains a high level sarsa agent written in python. This agent is a python paraphrase of the existing high level sarsa offense agent written in C++. A script that runs this agent is also included.

When the fullstate flag is ON

The python agent performs as well as the C++ agent (i.e. the difference in mean goal scoring percentage when averaged over 20 runs of 1v1, 1v2, 2v1, 2v2, 3v3 is less than 0.5% for each).

When the fullstate flag is OFF

In this case, the python agent obtains 0 to 5% Less average goal scoring percentage when compared to the C++ agent for each of the 1v1, 1v2, 2v1, 2v2, 3v3 cases.

Despite my Best efforts, I am unable to pin down the exact reason for this. I speculate that the passing of the state vector from the underlying C++ code to the python one and back, results in some rounding errors due to datatype incompatibility.

It would be great if the maintainers and the broader community could replicate this issue and find ways to overcome it.

LARG / HFO

Performance Degradation observed in PR #51 with partial observability #52

When the fullstate flag is ON

When the fullstate flag is OFF