Closed preuschhi closed 3 years ago
I fixed it by increasing continuous actions size
Hi @preuschhi
Did you modify any python code? It's a bit concerning that the fix was to increase the continuous action size since it looks like the error occurred with the discrete distribution. Can you explain your intuition for increasing the continuous action size?
i did not change any Python code. I looket at this piece of the Error Output =>RuntimeError: cannot sample n_sample > prob_dist.size(-1) samples without replacement<= and increased the continuous actions size by "1" and then eveything was okey. My intuition for increasing the continuous action size was rather try and test because I am still very new to this package.
Would you mind sharing the rest of the values in your behavior parameters script? E.g. the discrete action sizes/branches?
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
I do not think this is a bug, i rather think i made a mistake and do not really now how to fix it.
When i set up my training in the cmd console and start it in Unity, it seems to start normally but then it stops immediately. Here is the cmd output
(venv) D:\Unity\Projects\EnemyTrainingNewTry>mlagents-learn config/trainer_config.yaml --run-id=Test5
Version information: ml-agents: 0.23.0, ml-agents-envs: 0.23.0, Communicator API: 1.3.0, PyTorch: 1.7.1+cu110 2021-02-17 13:34:02 INFO [learn.py:275] run_seed set to 8747 2021-02-17 13:34:03 INFO [environment.py:205] Listening on port 5004. Start training by pressing the Play button in the Unity Editor. 2021-02-17 13:34:10 INFO [environment.py:111] Connected to Unity environment with package version 1.7.2-preview and communication version 1.5.0 2021-02-17 13:34:11 INFO [environment.py:271] Connected new brain: EnemyMovement?team=0 2021-02-17 13:34:11 INFO [stats.py:147] Hyperparameters for behavior name EnemyMovement: trainer_type: ppo hyperparameters: batch_size: 2048 buffer_size: 20480 learning_rate: 0.0003 beta: 0.005 epsilon: 0.2 lambd: 0.95 num_epoch: 3 learning_rate_schedule: linear network_settings: normalize: False hidden_units: 256 num_layers: 2 vis_encode_type: simple memory: None reward_signals: extrinsic: gamma: 0.99 strength: 1.0 init_path: None keep_checkpoints: 5 checkpoint_interval: 500000 max_steps: 5000000 time_horizon: 128 summary_freq: 10000 threaded: True self_play: None behavioral_cloning: None framework: pytorch 2021-02-17 13:34:36 INFO [model_serialization.py:104] Converting to results\Test5\EnemyMovement\EnemyMovement-0.onnx 2021-02-17 13:34:38 INFO [model_serialization.py:116] Exported results\Test5\EnemyMovement\EnemyMovement-0.onnx 2021-02-17 13:34:38 INFO [torch_model_saver.py:116] Copied results\Test5\EnemyMovement\EnemyMovement-0.onnx to results\Test5\EnemyMovement.onnx. 2021-02-17 13:34:38 INFO [trainer_controller.py:85] Saved Model Traceback (most recent call last): File "C:\Users\bmxle\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\bmxle\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "D:\Unity\Projects\EnemyTrainingNewTry\venv\Scripts\mlagents-learn.exe__main__.py", line 7, in
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\learn.py", line 280, in main
run_cli(parse_command_line())
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\learn.py", line 276, in run_cli
run_training(run_seed, options)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\learn.py", line 153, in run_training
tc.start_learning(env_manager)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, kwargs)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\trainer_controller.py", line 176, in start_learning
n_steps = self.advance(env_manager)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, *kwargs)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\trainer_controller.py", line 234, in advance
new_step_infos = env_manager.get_steps()
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\env_manager.py", line 113, in get_steps
new_step_infos = self._step()
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 264, in _step
self._queue_steps()
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 257, in _queue_steps
env_action_info = self._take_step(env_worker.previous_step)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(args, kwargs)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 379, in _take_step
step_tuple[0], last_step.worker_id
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 213, in get_action
decision_requests, global_agent_ids
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, kwargs)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 179, in evaluate
vec_obs, vis_obs, masks=masks, memories=memories
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, *kwargs)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 139, in sample_actions
vec_obs, vis_obs, masks, memories, seq_len
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\torch\networks.py", line 514, in get_action_stats
sequence_length=sequence_length,
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\torch\networks.py", line 318, in get_action_stats
action, log_probs, entropies = self.action_model(encoding, masks)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(input, kwargs)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\torch\action_model.py", line 194, in forward
actions = self._sample_action(dists)
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\torch\action_model.py", line 84, in _sample_action
discrete_action.append(discrete_dist.sample())
File "d:\unity\projects\enemytrainingnewtry\venv\lib\site-packages\mlagents\trainers\torch\distributions.py", line 114, in sample
return torch.multinomial(self.probs, 1)
RuntimeError: cannot sample n_sample > prob_dist.size(-1) samples without replacement