Open baraahsidahmed opened 6 months ago
Take a look at the observation
function for the scenario.
On line 247 we see that if the agent is a good agent you will observe:
rel_goal_x, rel_goal_y, (rel_lm_x, rel_lm_y) * n_landmarks, (rel_other_agent_x, rel_other_agent_y) * n_agents
In the case of N=2
there are N+1
agents and N
landmarks (see line 95), so that's 2+2*2+2*2 = 2 + 4 + 4 = 10
On line 250 we see that if the agent is an adversarial agent you will observe:
(rel_lm_x, rel_lm_y) * n_landmarks, (rel_other_agent_x, rel_other_agent_y) * n_agents
In the case of N=2
that's 2*2 + 2*2 = 4 + 4 = 8
Just an added comment. The documentation on the website defaults to the most recently released version of PettingZoo. If you're using the current master branch, things may have changed since then. The documentation for this game was updated a couple months ago (after the last release). You can change to the current master by selecting the master from the dropdown in the lower right of the doc page. IMO, it's still unclear because it doesn't explain that some values are arrays, but it does correctly match the code now. (credit to Elliot for pointing this out)
Thank you for all the in depth clarifications!
On line 247 we see that if the agent is a good agent you will observe:
rel_goal_x, rel_goal_y, (rel_lm_x, rel_lm_y) * n_landmarks, (rel_other_agent_x, rel_other_agent_y) * n_agents
I think the problem with the current documentation was 'self_vel' is not returned in the observation array so I was confused.
The documentation on the website defaults to the most recently released version of PettingZoo.
Thank you for pointing this out! for me I just clicked the Github link in the documentation page and assumed that it is the corresponding code without realizing the versions difference, sorry about that.
Question
Hi, I am working on simple_adversary_v3 along with agileRL to train the agents. I am using parallel_env and want to monitor the agents positions while training. I found that step function only returns next state observation as dictionary of arrays of elements (8 for adversary or 10 for good agents) and I can't quite work what each element of the array is because the documentation mention that it's [self_pos, self_vel, goal_rel_position, landmark_rel_position, other_agent_rel_positions] which are only five but in code comments it is mentioned it should be [goal_rel_position, landmark_rel_position, other_agent_rel_positions]. I am working with N=2 so what I did is interpreting the ten values of good agents as: [goal_rel_pos_x, goal_rel_pos_y, 1st_landmark_x, 1st_landmark_y, 2nd_landmark_x (same as goal), 2nd_landmark_y (same as goal), other_good_agent_x, other_good_agent_y, adversary_x, adversary_y] can you please confirm if this is the right value mapping or else what are the returned values exactly?