Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.12k stars 4.15k forks source link

The continuous action space of PettingZoo Wrapper API should use `np.float32` #5976

Closed 25349023 closed 11 months ago

25349023 commented 1 year ago

Describe the bug The action space for continuous action in the PettingZoo Wrapper API is currently np.int32, leading to unexpected and out of control agent behavior.

In the code ml-agents/ml-agents-envs/mlagents_envs/envs/unity_pettingzoo_base_env.py, line 133, it reads:

if act_spec.continuous_size > 0:
    c_space = spaces.Box(
        -1, 1, (act_spec.continuous_size,), dtype=np.int32
    )
....

However, for the continuous action space, the action should be real numbers, that is, np.float32, instead of integers. Hence, the correct code should be:

if act_spec.continuous_size > 0:
    c_space = spaces.Box(
        -1, 1, (act_spec.continuous_size,), dtype=np.float32
    )
....

To Reproduce Steps to reproduce the behavior: Any multi-agent environments that agents use continuous action space. Then use the PettingZoo Wrapper Python API, for example, UnityAECEnv.

Console logs / stack traces No.

Screenshots No.

Environment (please complete the following information):

NOTE: We are unable to help reproduce bugs with custom environments. Please attempt to reproduce your issue with one of the example environments, or provide a minimal patch to one of the environments needed to reproduce the issue.

25349023 commented 1 year ago

In addition, I found that there might be an another bug in ml-agents/ml-agents-envs/mlagents_envs/envs/unity_pettingzoo_base_env.py (line 168-171):

if action.continuous is not None:
    self._current_action[current_behavior].continuous[
        current_index
    ] = action.continuous[0]
....

This line sets the whole action of the agent to the first element of the given action. Assumes that the number of continuous actions is 3, and the given action is [0.3, 0.9, 0.1], then the final action for current agent will become [0.3, 0.3, 0.3], which is not correct.

I'm not sure if it is just a typo and I could delete the [0] directly, or there might be a missing dimension of the given action? (e.g. [[0.3, 0.9, 0.1]] instead of [0.3, 0.9, 0.1])

In my case, I simply delete the [0] and it works like a charm.

miguelalonsojr commented 1 year ago

Please upgrade to the latest version of ML-Agents available on the develop branch of this repo and try again.

github-actions[bot] commented 11 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 11 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale. Please open a new issue for related bugs.

kin-7777777 commented 5 months ago

This issue is still not resolved, even on the latest develop branch.

AmineAndam04 commented 2 months ago

Hi, how did you circumvent this issue?

TTMead commented 1 month ago

Hi, how did you circumvent this issue?

To avoid forking the repo and making the change locally, I directly reassigned the dtype of the action spaces for each agent after I created the environment.

for agent in env.possible_agents:
    env.action_space(agent).dtype = np.float32