ValueError in training - Githubissues

Jay-Vim-Lv commented 1 year ago

Hi, when i tried to replicate your code, i meet some issues. i can not find where the problem is or how to solve it, could you help me? my environment is builted the same as you recommend, the system is ubuntu 18.04 LTS. there are 2 gpus : 1080Ti & titan X in the code, I only modified the 'num_workers' and 'batch_size' in the YAML file to match my hardware. when i run python light_malib/main_pbt.py --config light_malib/expr/gr_football/expr_10_vs_10_psro.yaml，It generated the following error message: ` (/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF) lxd@lxd-T630:/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football$ python light_malib/main_pbt.py --config light_malib/expr/gr_football/expr_10_vs_10_psro.yaml [2023-09-28 09:18:34,036][WARNING] No active cluster detected, will create local ray instance. [2023-09-28 09:18:44,991][WARNING] ============== Cluster Info ============== {'node_ip_address': '192.168.1.109', 'raylet_ip_address': '192.168.1.109', 'redis_address': '192.168.1.109:6379', 'object_store_address': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469/sockets/raylet', 'webui_url': None, 'session_dir': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469', 'metrics_export_port': 55494, 'node_id': 'a8211a7e16deb107246a6dfd4b68c7d43f1a31ddb9fdba7c482c3b64'} [2023-09-28 09:18:44,993][WARNING] * cluster resources: {'accelerator_type:G': 1.0, 'GPU': 2.0, 'object_store_memory': 17054784307.0, 'memory': 34109568615.0, 'node:192.168.1.109': 1.0, 'CPU': 48.0} [2023-09-28 09:18:44,993][WARNING] this worker ip: 192.168.1.109 [2023-09-28 09:18:44,994][WARNING] Automatically set master ip to local ip address: 192.168.1.109 [2023-09-28 09:18:46,480][INFO] AgentManager initialized [2023-09-28 09:18:46,514][WARNING] use meta solver type: nash [2023-09-28 09:18:46,991][INFO] PBTRunner psro initialized [2023-09-28 09:18:46,991][INFO] PolicyFactory_agent_0_default new policy ctr starts at -1 [2023-09-28 09:18:46,995][WARNING] use model type: gr_football.built_in_11 (pid=47592) [2023-09-28 09:18:49,787][INFO] DataServer initialized (pid=47595) [2023-09-28 09:18:49,798][INFO] PolicyServer initialized [2023-09-28 09:18:50,411][INFO] Load initial policy built_in_11 from light_malib/trained_models/gr_football/11_vs_11/built_in [2023-09-28 09:18:50,426][WARNING] use model type: gr_football.basic_11 [2023-09-28 09:18:50,479][WARNING] agent_0: agent_0-default-0 is initialized from random [2023-09-28 09:18:50,479][WARNING] policy agent_0-default-0 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False} [2023-09-28 09:18:50,523][WARNING] after initialization:

policy_ids: ['built_in_11', 'agent_0-default-0'] populations:

policy_ids:['built_in_11', 'agent_0-default-0']

policy_ids:['built_in_11', 'agent_0-default-0']

YanSong97 commented 1 year ago

Hi Jay:

What worker_num and batch size you used? Have you tried difference values?

Jay-Vim-Lv commented 1 year ago

num_workers=20 or 30 batch_size=8 or 32 or else nothing else has been changed

qyh-stbz commented 1 year ago

I'm also getting the same error

ZHQ-air commented 12 months ago

Hi, I have also encountered the similar problem as Jay-Vim-Lv, and do you konw how to solve this problem. The error information is as follows（错误输出信息如下所示）：

(light-malib) zhq@zhq-Taitan:~/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1$ python main_pbt.py --config light_malib/expr/gr_football/expr_5_vs_5_psro.yaml
[2023-11-07 16:02:59,921][WARNING] No active cluster detected, will create local ray instance.
[2023-11-07 16:03:01,223][WARNING] ============== Cluster Info ==============
{'node_ip_address': '10.1.80.147', 'raylet_ip_address': '10.1.80.147', 'redis_address': '10.1.80.147:6379', 'object_store_address': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030/sockets/raylet', 'webui_url': None, 'session_dir': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030', 'metrics_export_port': 60763, 'node_id': '2841381510c4b1ba545ad7dcb7719998de2b9228147bcb839aa9b7d0'}
[2023-11-07 16:03:01,227][WARNING] * cluster resources:
{'accelerator_type:G': 1.0, 'memory': 37538726708.0, 'GPU': 1.0, 'CPU': 12.0, 'object_store_memory': 18769363353.0, 'node:10.1.80.147': 1.0}
[2023-11-07 16:03:01,228][WARNING] this worker ip: 10.1.80.147
[2023-11-07 16:03:01,232][WARNING] Automatically set master ip to local ip address: 10.1.80.147
[2023-11-07 16:03:01,747][INFO] AgentManager initialized
[2023-11-07 16:03:01,754][WARNING] use meta solver type: nash
[2023-11-07 16:03:01,839][INFO] PBTRunner psro initialized
[2023-11-07 16:03:01,839][INFO] PolicyFactory_agent_0_default new policy ctr starts at -1
[2023-11-07 16:03:01,840][WARNING] use model type: gr_football.built_in_5
(pid=687111) [2023-11-07 16:03:02,545][INFO] DataServer initialized
(pid=687117) [2023-11-07 16:03:02,596][INFO] PolicyServer initialized
[2023-11-07 16:03:02,694][INFO] Load initial policy built_in_5 from light_malib/trained_models/gr_football/5_vs_5/built_in
[2023-11-07 16:03:02,696][WARNING] use model type: gr_football.basic_5
[2023-11-07 16:03:02,704][WARNING] agent_0: agent_0-default-0 is initialized from random
[2023-11-07 16:03:02,704][WARNING] policy agent_0-default-0 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-11-07 16:03:02,716][WARNING] after initialization:

<A agent_0>
policy_ids:
['built_in_5', 'agent_0-default-0']
populations:
<P __all__> policy_ids:['built_in_5', 'agent_0-default-0']<P default> policy_ids:['built_in_5', 'agent_0-default-0']

[2023-11-07 16:03:02,716][WARNING] Evaluation rollouts (num: 5) for 3 policy combinations: [{'agent_0': {'built_in_5': 1.0}, 'agent_1': {'built_in_5': 1.0}}, {'agent_0': {'built_in_5': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}, {'agent_0': {'agent_0-default-0': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}]
(pid=687118) [2023-11-07 16:03:02,857][INFO] RolloutManager initialized
(pid=687114) [2023-11-07 16:03:03,011][INFO] TrainingManager initialized
(pid=687108) [2023-11-07 16:03:04,067][INFO] trainer_0 (local rank: 0) initialized
(pid=687107) [2023-11-07 16:03:04,142][INFO] DataPrefetcher initialized
Elo = dict_items([('agent_0-default-0', 984.368153396761), ('built_in_5', 1015.631846603239)])
[2023-11-07 16:04:22,723][INFO] policy_data: [('built_in_5', 'built_in_5'):{'payoff': 0.0, 'score': 0.5, 'win': 0.1, 'lose': 0.1, 'my_goal': 0.2, 'goal_diff': 0.0}],[('built_in_5', 'agent_0-default-0'):{'payoff': 1.0, 'score': 1.0, 'win': 1.0, 'lose': 0.0, 'my_goal': 1.5, 'goal_diff': 1.5}],[('agent_0-default-0', 'built_in_5'):{'payoff': -1.0, 'score': 0.0, 'win': 0.0, 'lose': 1.0, 'my_goal': 0.0, 'goal_diff': -1.5}],[('agent_0-default-0', 'agent_0-default-0'):{'payoff': 0.0, 'score': 0.5, 'win': 0.4, 'lose': 0.4, 'my_goal': 0.5, 'goal_diff': 0.0}],
(pid=687113) /home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/monitor/monitor.py:66: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=687113)   ax.set_xticklabels([""] + xpid, rotation=90)
(pid=687113) /home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/monitor/monitor.py:67: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=687113)   ax.set_yticklabels([""] + ypid)
[2023-11-07 16:04:22,839][INFO] payoff table:
+------------+--------------+-------------+
|            |   built_in_5 |   default-0 |
+============+==============+=============+
| built_in_5 |           +0 |        +100 |
+------------+--------------+-------------+
| default-0  |         -100 |          +0 |
+------------+--------------+-------------+
[2023-11-07 16:04:22,839][INFO] default-0's top 10 worst opponents are:
+-------------+----------+
| policy_id   |   payoff |
+=============+==========+
| built_in_5  |  -100.00 |
+-------------+----------+
| default-0   |    +0.00 |
+-------------+----------+
[2023-11-07 16:04:28,836][WARNING] agent_0: agent_0-default-1 is initialized from last best policy agent_0-default-0
[2023-11-07 16:04:28,836][WARNING] policy agent_0-default-1 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-11-07 16:04:28,843][WARNING] ********** Generation[0] Agent[agent_0] START **********
[2023-11-07 16:04:28,843][INFO] training_desc: TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_5', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7fe0f9dfe970>, kwargs={})
(pid=687111) [2023-11-07 16:04:28,870][WARNING] table_cfgs:DataServer uses {'capacity': 1000, 'sampler_type': 'lumrf', 'sample_max_usage': 10000, 'rate_limiter_cfg': {'min_size': 24}}
(pid=687111) [2023-11-07 16:04:28,872][INFO] DataServer created data table agent_0-default-1
(pid=687118) [2023-11-07 16:04:28,879][INFO] Rollout 1
(pid=687108) [2023-11-07 16:04:28,890][INFO] local_rank: 0 cuda_visible_devices:0
(pid=687108) [2023-11-07 16:04:30,187][WARNING] trainer_0 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_5', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7f135e001dc0>, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:10.1.80.147': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}})
(pid=687118) [2023-11-07 16:04:55,473][WARNING] save the best model(average reward:-5020.0,average win:0.0)
(pid=687118) [2023-11-07 16:04:55,495][INFO] Rollout 2
(pid=687118) [2023-11-07 16:05:21,904][WARNING] save the best model(average reward:-3370.6666666666665,average win:0.0)
(pid=687118) [2023-11-07 16:05:21,925][INFO] Rollout 3
(pid=687118) [2023-11-07 16:05:53,902][WARNING] save the best model(average reward:-2539.0,average win:0.0)
(pid=687118) [2023-11-07 16:05:53,923][INFO] Rollout 4
(pid=687118) [2023-11-07 16:06:24,837][WARNING] save the best model(average reward:-2048.0,average win:0.0)
(pid=687118) [2023-11-07 16:06:24,856][INFO] Rollout 5
(pid=687118) [2023-11-07 16:06:55,875][WARNING] save the best model(average reward:-1714.0,average win:0.0)
(pid=687118) [2023-11-07 16:06:55,894][INFO] Rollout 6
(pid=687118) [2023-11-07 16:07:26,569][WARNING] save the best model(average reward:-1477.142857142857,average win:0.0)
(pid=687118) [2023-11-07 16:07:26,605][INFO] Rollout 7
(pid=687118) [2023-11-07 16:07:57,201][WARNING] save the best model(average reward:-1300.5,average win:0.0)
(pid=687118) [2023-11-07 16:07:57,220][INFO] Rollout 8
(pid=687118) [2023-11-07 16:08:28,308][WARNING] save the best model(average reward:-1162.6666666666667,average win:0.0)
(pid=687118) [2023-11-07 16:08:28,330][INFO] Rollout 9
(pid=687118) [2023-11-07 16:08:58,670][WARNING] save the best model(average reward:-1054.0,average win:0.0)
(pid=687118) [2023-11-07 16:08:58,688][INFO] Rollout 10
(pid=687118) [2023-11-07 16:09:30,212][WARNING] save the best model(average reward:-960.7272727272727,average win:0.0)
(pid=687118) [2023-11-07 16:09:30,234][INFO] Rollout 11
(pid=687118) [2023-11-07 16:10:00,340][WARNING] save the best model(average reward:-883.6666666666666,average win:0.0)
(pid=687118) [2023-11-07 16:10:00,362][INFO] Rollout 12
(pid=687118) [2023-11-07 16:10:32,308][WARNING] save the best model(average reward:-818.1538461538462,average win:0.0)
(pid=687118) [2023-11-07 16:10:32,333][INFO] Rollout 13
(pid=687118) [2023-11-07 16:11:04,471][WARNING] save the best model(average reward:-762.2857142857143,average win:0.0)
(pid=687118) [2023-11-07 16:11:04,495][INFO] Rollout 14
(pid=687118) [2023-11-07 16:11:34,548][WARNING] save the best model(average reward:-713.6,average win:0.0)
(pid=687118) [2023-11-07 16:11:34,572][INFO] Rollout 15
(pid=687118) [2023-11-07 16:12:05,414][WARNING] save the best model(average reward:-672.25,average win:0.0)
(pid=687118) [2023-11-07 16:12:05,435][INFO] Rollout 16
(pid=687118) [2023-11-07 16:12:35,794][WARNING] save the best model(average reward:-635.5294117647059,average win:0.0)
(pid=687118) [2023-11-07 16:12:35,812][INFO] Rollout 17
(pid=687118) [2023-11-07 16:13:05,773][WARNING] save the best model(average reward:-602.8888888888889,average win:0.0)
(pid=687118) [2023-11-07 16:13:05,796][INFO] Rollout 18
(pid=687118) [2023-11-07 16:13:36,861][WARNING] save the best model(average reward:-574.3157894736842,average win:0.0)
(pid=687118) [2023-11-07 16:13:36,877][INFO] Rollout 19
(pid=687118) [2023-11-07 16:14:07,636][WARNING] save the best model(average reward:-547.4,average win:0.0)
(pid=687118) [2023-11-07 16:14:07,653][INFO] Rollout 20
(pid=687118) [2023-11-07 16:14:38,884][WARNING] save the best model(average reward:-48.8,average win:0.0)
(pid=687118) [2023-11-07 16:14:38,905][INFO] Rollout 21
(pid=687118) [2023-11-07 16:15:08,913][WARNING] save the best model(average reward:-48.6,average win:0.0)
(pid=687118) [2023-11-07 16:15:08,931][INFO] Rollout 22
(pid=687118) [2023-11-07 16:15:38,800][WARNING] save the best model(average reward:-48.2,average win:0.0)
(pid=687118) [2023-11-07 16:15:38,820][INFO] Rollout 23
(pid=687118) [2023-11-07 16:16:07,657][INFO] Rollout 24
(pid=687118) [2023-11-07 16:16:38,027][WARNING] save the best model(average reward:-47.0,average win:0.0)
(pid=687118) [2023-11-07 16:16:38,044][INFO] Rollout 25
(pid=687118) [2023-11-07 16:17:07,691][INFO] Rollout 26
.
.
.(pid=687118) [2023-11-07 18:14:16,224][INFO] Rollout 264
(pid=687118) [2023-11-07 18:14:47,594][INFO] Rollout 265
Traceback (most recent call last):
  File "main_pbt.py", line 126, in <module>
    main()
  File "main_pbt.py", line 114, in main
    runner.run()
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/framework/pbt_runner.py", line 111, in run
    ray.get(training_task_ref)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/ray/worker.py", line 1625, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TrainingManager.train() (pid=687114, ip=10.1.80.147, repr=<light_malib.training.training_manager.TrainingManager object at 0x7fd1fa036340>)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/utils/decorator.py", line 22, in wrapper
    return func(self, *args, **kwargs)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/training/training_manager.py", line 146, in train
    statistics_list = ray.get(
ray.exceptions.RayTaskError(ValueError): ray::DistributedTrainer.optimize() (pid=687108, ip=10.1.80.147, repr=<light_malib.training.distributed_trainer.DistributedTrainer object at 0x7f135e001b80>)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/training/distributed_trainer.py", line 200, in optimize
    training_info = self.trainer.optimize(batch)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/trainer.py", line 94, in optimize
    tmp_opt_result = self.loss(mini_batch)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/common/loss_func.py", line 70, in __call__
    return tensor_cast(
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/utils/general.py", line 110, in wrap
    rets = func(*new_args, **kwargs)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute
    values, action_log_probs, dist_entropy = self._evaluate_actions(
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions
    dist = torch.distributions.Categorical(logits=logits)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/torch/distributions/categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/torch/distributions/distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (96000, 19)) of distribution Categorical(logits: torch.Size([96000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       grad_fn=<SubBackward0>)

YanSong97 commented 12 months ago

Hi, I have just uploaded a demo config. Feel free to try it out.

Also, my local pytorch version is at 1.13.0 and I cannot reproduce this error. Which pytorch version are you using?

ZHQ-air commented 12 months ago

Thank you very much for your response. This error does not happen again when I used the expr_10_vs_10_psro.yaml, where I set the batch_size=100 and num_works=5

Shanghai-Digital-Brain-Laboratory / DB-Football

ValueError in training #6