I found the setting of reverse_team_processing could influence the RL training performance a lot, especially by methods that usually employ multiple rollouts at the same time such as PPO. Setting reverse_team_processing to be False will usually improve the performance, while setting reverse_team_processing to be True seems to add some randomness to the environment, and might generate some weird states or actions in the game. For example, in the academy_3_vs_1_with_keeper scenario, I once observed that the action of long pass was kept turning into high pass when the environment is initialized with reverse_team_processing=True,
In the default setting, reverse_team_processing will be decided by the random seed of the game engine if deterministic=False, which can be found in env/scenario_builder.py.
In third_party/gfootball_engine/src/main.hpp. the comment says that reverse_team_processing is to "Reverse order of teams' processing, used for symmetry testing," but it seems that it does more than that and is highly relevant to the environment randomness.
Could anyone let me know what reverse_team_processing is exactly used for and can I just set it to be False all the time during training? Thank you!
I found the setting of
reverse_team_processing
could influence the RL training performance a lot, especially by methods that usually employ multiple rollouts at the same time such as PPO. Settingreverse_team_processing
to be False will usually improve the performance, while settingreverse_team_processing
to be True seems to add some randomness to the environment, and might generate some weird states or actions in the game. For example, in theacademy_3_vs_1_with_keeper
scenario, I once observed that the action of long pass was kept turning into high pass when the environment is initialized withreverse_team_processing=True
,In the default setting,
reverse_team_processing
will be decided by the random seed of the game engine ifdeterministic=False
, which can be found in env/scenario_builder.py.In third_party/gfootball_engine/src/main.hpp. the comment says that
reverse_team_processing
is to "Reverse order of teams' processing, used for symmetry testing," but it seems that it does more than that and is highly relevant to the environment randomness.Could anyone let me know what
reverse_team_processing
is exactly used for and can I just set it to beFalse
all the time during training? Thank you!