Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
16.93k stars 4.14k forks source link

How do i fix this? => RuntimeError: cannot sample n_sample > prob_dist.size(-1) samples without replacement #4952

Closed preuschhi closed 3 years ago

preuschhi commented 3 years ago

I do not think this is a bug, i rather think i made a mistake and do not really now how to fix it.

When i set up my training in the cmd console and start it in Unity, it seems to start normally but then it stops immediately. Here is the cmd output

(venv) D:\Unity\Projects\EnemyTrainingNewTry>mlagents-learn config/trainer_config.yaml --run-id=Test5

Version information: ml-agents: 0.23.0, ml-agents-envs: 0.23.0, Communicator API: 1.3.0, PyTorch: 1.7.1+cu110 2021-02-17 13:34:02 INFO [learn.py:275] run_seed set to 8747 2021-02-17 13:34:03 INFO [environment.py:205] Listening on port 5004. Start training by pressing the Play button in the Unity Editor. 2021-02-17 13:34:10 INFO [environment.py:111] Connected to Unity environment with package version 1.7.2-preview and communication version 1.5.0 2021-02-17 13:34:11 INFO [environment.py:271] Connected new brain: EnemyMovement?team=0 2021-02-17 13:34:11 INFO [stats.py:147] Hyperparameters for behavior name EnemyMovement: trainer_type: ppo hyperparameters: batch_size: 2048 buffer_size: 20480 learning_rate: 0.0003 beta: 0.005 epsilon: 0.2 lambd: 0.95 num_epoch: 3 learning_rate_schedule: linear network_settings: normalize: False hidden_units: 256 num_layers: 2 vis_encode_type: simple memory: None reward_signals: extrinsic: gamma: 0.99 strength: 1.0 init_path: None keep_checkpoints: 5 checkpoint_interval: 500000 max_steps: 5000000 time_horizon: 128 summary_freq: 10000 threaded: True self_play: None behavioral_cloning: None framework: pytorch 2021-02-17 13:34:36 INFO [model_serialization.py:104] Converting to results\Test5\EnemyMovement\EnemyMovement-0.onnx 2021-02-17 13:34:38 INFO [model_serialization.py:116] Exported results\Test5\EnemyMovement\EnemyMovement-0.onnx 2021-02-17 13:34:38 INFO [torch_model_saver.py:116] Copied results\Test5\EnemyMovement\EnemyMovement-0.onnx to results\Test5\EnemyMovement.onnx. 2021-02-17 13:34:38 INFO [trainer_controller.py:85] Saved Model Traceback (most recent call last): File "C:\Users\bmxle\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\bmxle\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "D:\Unity\Projects\EnemyTrainingNewTry\venv\Scripts\mlagents-learn.exe__main__.py", line 7, in File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\learn.py", line 280, in main run_cli(parse_command_line()) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\learn.py", line 276, in run_cli run_training(run_seed, options) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\learn.py", line 153, in run_training tc.start_learning(env_manager) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped return func(*args, kwargs) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\trainer_controller.py", line 176, in start_learning n_steps = self.advance(env_manager) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped return func(*args, *kwargs) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\trainer_controller.py", line 234, in advance new_step_infos = env_manager.get_steps() File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\env_manager.py", line 113, in get_steps new_step_infos = self._step() File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 264, in _step self._queue_steps() File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 257, in _queue_steps env_action_info = self._take_step(env_worker.previous_step) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped return func(args, kwargs) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 379, in _take_step step_tuple[0], last_step.worker_id File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 213, in get_action decision_requests, global_agent_ids File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped return func(*args, kwargs) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 179, in evaluate vec_obs, vis_obs, masks=masks, memories=memories File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped return func(*args, *kwargs) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 139, in sample_actions vec_obs, vis_obs, masks, memories, seq_len File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\torch\networks.py", line 514, in get_action_stats sequence_length=sequence_length, File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\torch\networks.py", line 318, in get_action_stats action, log_probs, entropies = self.action_model(encoding, masks) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\torch\action_model.py", line 194, in forward actions = self._sample_action(dists) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\torch\action_model.py", line 84, in _sample_action discrete_action.append(discrete_dist.sample()) File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\torch\distributions.py", line 114, in sample return torch.multinomial(self.probs, 1) RuntimeError: cannot sample n_sample > prob_dist.size(-1) samples without replacement

preuschhi commented 3 years ago

I fixed it by increasing continuous actions size

andrewcoh commented 3 years ago

Hi @preuschhi

Did you modify any python code? It's a bit concerning that the fix was to increase the continuous action size since it looks like the error occurred with the discrete distribution. Can you explain your intuition for increasing the continuous action size?

preuschhi commented 3 years ago

i did not change any Python code. I looket at this piece of the Error Output =>RuntimeError: cannot sample n_sample > prob_dist.size(-1) samples without replacement<= and increased the continuous actions size by "1" and then eveything was okey. My intuition for increasing the continuous action size was rather try and test because I am still very new to this package.

andrewcoh commented 3 years ago

Would you mind sharing the rest of the values in your behavior parameters script? E.g. the discrete action sizes/branches?

preuschhi commented 3 years ago

image

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.