Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.19k stars 4.16k forks source link

Release 11 base_env._validate_action results in attribute error #4802

Closed 18goldr closed 2 years ago

18goldr commented 3 years ago

Describe the bug When I call env.set_action_for_agent(behavior_name, mlagent_id, action), where action is of type numpy.ndarray, I get this AttributeError which results from base_env._validate_action

File "C:\...\mlagents_envs\environment.py", line 356, in set_action_for_agent        
    action = action_spec._validate_action(action, None, behavior_name)
  File "C:\...\mlagents_envs\base_env.py", line 404, in _validate_action
    if actions.continuous.shape != _expected_shape:
AttributeError: 'numpy.ndarray' object has no attribute 'continuous'

It looks as though action should be converted to a base_env.ActionTuple at some point, yet it isn't.

Rolling back to release 7 fixes the problem.

To Reproduce Call env.set_action_for_agent with the appropriate parameters (behavior name, mlagent id, and the action in the form of a numpy.ndarray).

Console logs / stack traces

File "C:\...\mlagents_envs\environment.py", line 356, in set_action_for_agent        
    action = action_spec._validate_action(action, None, behavior_name)
  File "C:\...\mlagents_envs\base_env.py", line 404, in _validate_action
    if actions.continuous.shape != _expected_shape:
AttributeError: 'numpy.ndarray' object has no attribute 'continuous'

Environment (please complete the following information):

NOTE: I do not currently have time to try this in an example environment. If/when I have time, I will do so, but it seems pretty obvious from the code whats happening and that it would happen in all environments.

andrewcoh commented 3 years ago

Hi @18goldr

Are you using a custom trainer, policy architecture or env_manager? The LLAPI expects a base_env.ActionTuple as an input to set_action/set_action_for_agent which is new to this release.

yzexeter commented 3 years ago

@andrewcoh Hi, Thank you for your advice. I have encountered the same problem. I look it up in the API description following your advice. I do find the description, which is:

"Set Actions :env.set_actions(behavior_name: str, action: ActionTuple) Sets the actions for a whole agent group. action is an ActionTuple, which is made up of a 2D np.array of dtype=np.int32 for discrete actions, and dtype=np.float32 for continuous actions. The first dimension of np.array in the tuple is the number of agents that requested a decision since the last call to env.step(). The second dimension is the number of discrete or continuous actions for the corresponding array."

However there is no more information. Could you reveal some examples to define the tuple. I can not manage it.

I have some codes, which can work at previous version of ml tool kit. if i < 10: action = np.array([[1.0,0.0]], dtype=np.float32) env.set_actions(single, action) if i == 15: action = np.array([[0.0,1.0]], dtype=np.float32) env.set_actions(single, action) env.step()

Could help me to modify it to work?

yzexeter commented 3 years ago

@andrewcoh OK , I check the source code and figure it out. Thank you for you information. @18goldr Hi, based on my codes above, I try the following things to fix it.

it should be

from mlagents_envs.base_env import ActionTuple
if i < 10:

    action = ActionTuple(np.array([[1.0,0.0]], dtype=np.float32))

    env.set_actions(single, action)

if i == 15:

    action = ActionTuple(np.array([[0.0,1.0]], dtype=np.float32))

    env.set_actions(single, action)

env.step()
jinPrelude commented 3 years ago

Same problem When I run unity3d_env_local.py(rllib example for unity3d environment).

Describe the bug If I run the unity3d_env_local.py(rllib example for unity3d environment) it returns the error like below:

Failure # 1 (occurred at 2020-12-28_15-57-15)
Traceback (most recent call last):
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 519, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 497, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/worker.py", line 1391, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::PPO.train() (pid=24483, ip=192.168.0.176)
  File "python/ray/_raylet.pyx", line 479, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 431, in ray._raylet.execute_task.function_executor
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 523, in train
    raise e
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 509, in train
    result = Trainable.train(self)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/trainable.py", line 183, in train
    result = self.step()
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 148, in step
    res = next(self.train_exec_impl)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 756, in __next__
    return next(self.built_iterator)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  [Previous line repeated 1 more time]
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 876, in apply_flatten
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 828, in add_wait_hooks
    item = next(it)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/execution/rollout_ops.py", line 69, in sampler
    yield workers.local_worker().sample()
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 645, in sample
    batches = [self.input_reader.next()]
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 94, in next
    batches = [self.get_data()]
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 216, in get_data
    item = next(self.rollout_provider)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 663, in _env_runner
    base_env.send_actions(actions_to_send)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/env/base_env.py", line 399, in send_actions
    obs, rewards, dones, infos = env.step(agent_dict)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/env/unity3d_env.py", line 129, in step
    action_dict[key])
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/mlagents_envs/environment.py", line 356, in set_action_for_agent
    action = action_spec._validate_action(action, None, behavior_name)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/mlagents_envs/base_env.py", line 404, in _validate_action
    if actions.continuous.shape != _expected_shape:
AttributeError: 'numpy.ndarray' object has no attribute 'continuous'

To Reproduce I followed the steps in the comments in unity3d_env_local.py. I run the script using torch framework but tf also returns the same error. 1) Install Unity3D and pip install mlagents. 2) Open the Unity3D Editor and load an example scene from the following ml-agents pip package location: .../ml-agents/Project/Assets/ML-Agents/Examples/ 3) change default framework from tf to torch 4) run the script(3DBall)

Console logs / stack traces

cd /home/jinprelude/Documents/rllib ; /usr/bin/env /home/jinprelude/anaconda3/envs/rllib/bin/python /home/jinprelude/.vscode/extensions/ms-python.python-2020.12.424452561/pythonFiles/lib/python/debugpy/launcher 44291 -- /home/jinprelude/Documents/rllib/run_unity3d.py 
WARNING:tensorflow:From /home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-12-28 15:56:50,417 INFO services.py:1171 -- View the Ray dashboard at http://127.0.0.1:8265
== Status ==
Memory usage on this node: 8.3/62.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/8 CPUs, 0/1 GPUs, 0.0/33.59 GiB heap, 0.0/11.57 GiB objects (0/1.0 accelerator_type:GTX)
Result logdir: /home/jinprelude/ray_results/PPO
Number of trials: 1/1 (1 RUNNING)

(pid=24483) WARNING:tensorflow:From /home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=24483) Instructions for updating:
(pid=24483) non-resource variables are not supported in the long term
(pid=24483) 2020-12-28 15:56:54,184     INFO trainer.py:633 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=24483) No game binary provided, will use a running Unity editor instead.
(pid=24483) Make sure you are pressing the Play (|>) button in your editor to start.
(pid=24483) 2020-12-28 15:57:14,621     INFO trainable.py:102 -- Trainable.setup took 20.438 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=24483) 2020-12-28 15:57:14,621     WARNING util.py:43 -- Install gputil for GPU system monitoring.
(pid=24483) Created UnityEnvironment for port 5004
(pid=24483) 2020-12-28 15:57:14,710     WARNING deprecation.py:30 -- DeprecationWarning: `env_index` has been deprecated. Use `episode.env_id` instead. This will raise an error in the future!
2020-12-28 15:57:15,231 ERROR trial_runner.py:607 -- Trial PPO_unity3d_dd8d1_00000: Error processing event.
Traceback (most recent call last):
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 519, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 497, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/worker.py", line 1391, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::PPO.train() (pid=24483, ip=192.168.0.176)
  File "python/ray/_raylet.pyx", line 479, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 431, in ray._raylet.execute_task.function_executor
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 523, in train
    raise e
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 509, in train
    result = Trainable.train(self)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/trainable.py", line 183, in train
    result = self.step()
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 148, in step
    res = next(self.train_exec_impl)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 756, in __next__
    return next(self.built_iterator)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  [Previous line repeated 1 more time]
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 876, in apply_flatten
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 828, in add_wait_hooks
    item = next(it)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/execution/rollout_ops.py", line 69, in sampler
    yield workers.local_worker().sample()
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 645, in sample
    batches = [self.input_reader.next()]
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 94, in next
    batches = [self.get_data()]
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 216, in get_data
    item = next(self.rollout_provider)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 663, in _env_runner
    base_env.send_actions(actions_to_send)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/env/base_env.py", line 399, in send_actions
    obs, rewards, dones, infos = env.step(agent_dict)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/env/unity3d_env.py", line 129, in step
    action_dict[key])
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/mlagents_envs/environment.py", line 356, in set_action_for_agent
    action = action_spec._validate_action(action, None, behavior_name)
  File "/home/jinprelude/anaconda3/envs/rllib/lib/python3.7/site-packages/mlagents_envs/base_env.py", line 404, in _validate_action
    if actions.continuous.shape != _expected_shape:
AttributeError: 'numpy.ndarray' object has no attribute 'continuous'
== Status ==
Memory usage on this node: 8.5/62.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/1 GPUs, 0.0/33.59 GiB heap, 0.0/11.57 GiB objects (0/1.0 accelerator_type:GTX)
Result logdir: /home/jinprelude/ray_results/PPO
Number of trials: 1/1 (1 ERROR)
Number of errored trials: 1
+-------------------------+--------------+------------------------------------------------------------------------------------------+
| Trial name              |   # failures | error file                                                                               |
|-------------------------+--------------+------------------------------------------------------------------------------------------|
| PPO_unity3d_dd8d1_00000 |            1 | /home/jinprelude/ray_results/PPO/PPO_unity3d_dd8d1_00000_0_2020-12-28_15-56-52/error.txt |
+-------------------------+--------------+------------------------------------------------------------------------------------------+

== Status ==
Memory usage on this node: 8.5/62.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/1 GPUs, 0.0/33.59 GiB heap, 0.0/11.57 GiB objects (0/1.0 accelerator_type:GTX)
Result logdir: /home/jinprelude/ray_results/PPO
Number of trials: 1/1 (1 ERROR)
Number of errored trials: 1
+-------------------------+--------------+------------------------------------------------------------------------------------------+
| Trial name              |   # failures | error file                                                                               |
|-------------------------+--------------+------------------------------------------------------------------------------------------|
| PPO_unity3d_dd8d1_00000 |            1 | /home/jinprelude/ray_results/PPO/PPO_unity3d_dd8d1_00000_0_2020-12-28_15-56-52/error.txt |
+-------------------------+--------------+------------------------------------------------------------------------------------------+

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Thanks for your hard work!

budbreaker commented 3 years ago

I have the same issue as @jinPrelude when trying to run 3DBall and Tennis

sven1977 commented 3 years ago

Ok, I can reproduce the error on the Ray RLlib side now (yes, it's a change in the ML-Agents API for set_action_for_agent calls). I'll provide a fix in RLlib. @budbreaker @jinPrelude

sven1977 commented 3 years ago

Fix for RLlib (will check ML-Agents API version; backward-compatible): https://github.com/ray-project/ray/pull/14569

jinPrelude commented 3 years ago

Thank you SO MUCH for your hard work!! @sven1977. You and RLLib are awesome.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had activity in the last 42 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.