Closed zkzfor closed 1 year ago
This happens because you have installed the latest numpy version 1.24.0 where np.int is depreciated. You can solve it by installing a lower version of numpy.
pip install "numpy<1.24.0"
When I install numpy==1.19.4, this command will report an error directly When the "np.int" in line 531 of the file of "/mnt/workspace/HARL/harl/runners/off_policy_base_runner.py" is changed to "np.int64", the command can be run
Maybe you can try installing numpy==1.21.6. It has worked for me
Ok, numpy==1.21.6 does work, but there are other problems. Is hasac algorithm not adapted to pettingzoo_mpe environment? I get the following error when running the command "python train.py --algo hasac --env pettingzoo_mpe --exp_name mpe_hasac" :
start warmup
finish warmup, start training
Env pettingzoo_mpe Task simple_spread_v2-continuous Algo hasac Exp mpe_hasac Evaluation at step 210000 / 20000000:
Eval average episode reward is -102.3018353956586, eval average episode length is 25.0.
Traceback (most recent call last):
File "train.py", line 95, in <module>
main()
File "train.py", line 90, in main
runner.run()
File "/mnt/workspace/HARL/harl/runners/off_policy_base_runner.py", line 278, in run
self.train()
File "/mnt/workspace/HARL/harl/runners/off_policy_ha_runner.py", line 162, in train
actions[agent_id], _ = self.actor[
File "/mnt/workspace/HARL/harl/algorithms/actors/hasac.py", line 56, in get_actions_with_logprobs
actions, logp_actions = self.actor(
File "/home/pai/envs/harl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/HARL/harl/models/policy_models/squashed_gaussian_policy.py", line 63, in forward
pi_distribution = Normal(mu, std)
File "/home/pai/envs/harl/lib/python3.8/site-packages/torch/distributions/normal.py", line 50, in __init__
super(Normal, self).__init__(batch_shape, validate_args=validate_args)
File "/home/pai/envs/harl/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (1000, 5)) of distribution Normal(loc: torch.Size([1000, 5]), scale: torch.Size([1000, 5])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
...,
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan]], device='cuda:0')
My environment is as follows:
# packages in environment at /home/pai/envs/harl:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main defaults
_openmp_mutex 5.1 1_gnu defaults
absl-py 1.4.0 pypi_0 pypi
ca-certificates 2023.05.30 h06a4308_0 defaults
cachetools 5.3.1 pypi_0 pypi
certifi 2023.7.22 pypi_0 pypi
cffi 1.15.1 pypi_0 pypi
charset-normalizer 3.2.0 pypi_0 pypi
cloudpickle 2.2.1 pypi_0 pypi
farama-notifications 0.0.4 pypi_0 pypi
future 0.18.3 pypi_0 pypi
google-auth 2.22.0 pypi_0 pypi
google-auth-oauthlib 1.0.0 pypi_0 pypi
grpcio 1.57.0 pypi_0 pypi
gym 0.21.0 pypi_0 pypi
gymnasium 0.29.1 pypi_0 pypi
harl 1.0.0 dev_0 <develop>
idna 3.4 pypi_0 pypi
importlib-metadata 4.13.0 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1 defaults
libffi 3.4.4 h6a678d5_0 defaults
libgcc-ng 11.2.0 h1234567_1 defaults
libgomp 11.2.0 h1234567_1 defaults
libstdcxx-ng 11.2.0 h1234567_1 defaults
markdown 3.4.4 pypi_0 pypi
markupsafe 2.1.3 pypi_0 pypi
ncurses 6.4 h6a678d5_0 defaults
numpy 1.21.6 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
openssl 3.0.10 h7f8727e_2 defaults
packaging 23.1 pypi_0 pypi
pettingzoo 1.22.2 pypi_0 pypi
pillow 10.0.0 pypi_0 pypi
pip 23.2.1 py38h06a4308_0 defaults
protobuf 4.24.2 pypi_0 pypi
pyasn1 0.5.0 pypi_0 pypi
pyasn1-modules 0.3.0 pypi_0 pypi
pycparser 2.21 pypi_0 pypi
pygame 2.1.2 pypi_0 pypi
pyglet 1.5.0 pypi_0 pypi
pymunk 6.2.1 pypi_0 pypi
python 3.8.17 h955ad1f_0 defaults
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h5eee18b_0 defaults
requests 2.31.0 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
rsa 4.9 pypi_0 pypi
setproctitle 1.3.2 pypi_0 pypi
setuptools 65.5.0 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0 defaults
supersuit 3.7.0 pypi_0 pypi
tensorboard 2.14.0 pypi_0 pypi
tensorboard-data-server 0.7.1 pypi_0 pypi
tensorboardx 2.6.2.2 pypi_0 pypi
tinyscaler 1.2.6 pypi_0 pypi
tk 8.6.12 h1ccaba5_0 defaults
torch 1.10.0+cu111 pypi_0 pypi
torchaudio 0.10.0+rocm4.1 pypi_0 pypi
torchvision 0.11.0+cu111 pypi_0 pypi
typing-extensions 4.7.1 pypi_0 pypi
urllib3 1.26.16 pypi_0 pypi
werkzeug 2.3.7 pypi_0 pypi
wheel 0.38.4 py38h06a4308_0 defaults
xz 5.4.2 h5eee18b_0 defaults
zipp 3.16.2 pypi_0 pypi
zlib 1.2.13 h5eee18b_0 defaults
This error occurs due to your chosen hyperparameters. A great example of this. Could you provide the hyperparameters used? Also, you can use the following hyperparameters, which work fine for me.
{
"algo_args": {
"algo": {
"alpha": 0.2,
"alpha_lr": 0.0003,
"auto_alpha": true,
"batch_size": 1000,
"buffer_size": 1000000,
"fixed_order": false,
"gamma": 0.99,
"huber_delta": 10.0,
"n_step": 20,
"polyak": 0.005,
"share_param": false,
"use_huber_loss": false,
"use_policy_active_masks": true
},
"device": {
"cuda": true,
"cuda_deterministic": true,
"torch_threads": 4
},
"eval": {
"eval_episodes": 40,
"n_eval_rollout_threads": 20,
"use_eval": true
},
"logger": {
"log_dir": "./results"
},
"model": {
"activation_func": "relu",
"critic_lr": 0.0005,
"final_activation_func": "tanh",
"gain": 0.01,
"hidden_sizes": [
256,
256
],
"initialization_method": "orthogonal_",
"lr": 0.0005,
"use_feature_normalization": true
},
"render": {
"render_episodes": 10,
"use_render": false
},
"seed": {
"seed": 3,
"seed_specify": true
},
"train": {
"eval_interval": 10000,
"log_interval": null,
"model_dir": null,
"n_rollout_threads": 20,
"num_env_steps": 20000000,
"train_interval": 50,
"update_per_train": 1,
"use_linear_lr_decay": false,
"use_proper_time_limits": true,
"use_valuenorm": false,
"warmup_steps": 10000
}
},
"env_args": {
"continuous_actions": true,
"scenario": "simple_spread_v2"
},
"main_args": {
"algo": "hasac",
"env": "pettingzoo_mpe",
"exp_name": "test",
"load_config": ""
}
}
Thank you very much for your guidance, now the algorithm works normally. My previous configuration is as follows:
{
"algo_args": {
"algo": {
"alpha": 0.001,
"alpha_lr": 0.0003,
"auto_alpha": false,
"batch_size": 1000,
"buffer_size": 1000000,
"fixed_order": false,
"gamma": 0.99,
"huber_delta": 10.0,
"n_step": 20,
"polyak": 0.005,
"share_param": false,
"use_huber_loss": true,
"use_policy_active_masks": true
},
"device": {
"cuda": true,
"cuda_deterministic": true,
"torch_threads": 4
},
"eval": {
"eval_episodes": 40,
"n_eval_rollout_threads": 20,
"use_eval": true
},
"logger": {
"log_dir": "./results"
},
"model": {
"activation_func": "relu",
"critic_lr": 0.0005,
"final_activation_func": "tanh",
"gain": 0.01,
"hidden_sizes": [
256,
256
],
"initialization_method": "orthogonal_",
"lr": 0.0005,
"use_feature_normalization": true
},
"render": {
"render_episodes": 10,
"use_render": false
},
"seed": {
"seed": 1,
"seed_specify": true
},
"train": {
"eval_interval": 10000,
"log_interval": null,
"model_dir": null,
"n_rollout_threads": 20,
"num_env_steps": 20000000,
"train_interval": 50,
"update_per_train": 1,
"use_linear_lr_decay": false,
"use_proper_time_limits": true,
"use_valuenorm": true,
"warmup_steps": 10000
}
},
"env_args": {
"continuous_actions": true,
"scenario": "simple_spread_v2"
},
"main_args": {
"algo": "hasac",
"env": "pettingzoo_mpe",
"exp_name": "mpe_hasac",
"load_config": ""
}
}
The above hyperparameters are not manually set by me, but generated using the following command:
python train.py --algo hasac --env pettingzoo_mpe --exp_name mpe_hasac
You're welcome. The reason for the NaNs error you encountered appears to be due to setting the huber_loss and valuenorm to true. In continuous environments, I usually set both of these hyperparameters to false.
Thanks you.The type of algorithm you've implemented is a heterogeneous algorithm. If some agents use continuous action spaces while others use discrete action spaces, how should this be configured? I've found through testing that your HASAC algorithm trains very stably, which is excellent. It saves a lot of tuning effort for someone like me. I'd like to apply it to the problem of integrated communication resource allocation in a unified way for agents using discrete and continuous spaces. I'm not sure if this is an application scenario for your heterogeneous algorithm and what configuration aspects need to be considered when using it. :)
Thank you very much for your interest in applying our algorithm to other scenarios. I‘m sorry we haven't tested our algorithm in environments with mixed discrete and continuous action spaces yet, due to the limited availability of such benchmarks. Nonetheless, I believe it can still be applicable in such settings because I took that into consideration during implementation. As for hyperparameter settings, apart from the two hyperparameters I suggested setting to false earlier, the primary parameter that requires tuning is alpha. Based on my past experience, setting alpha to be automatically-tuned usually yields good results.
If you encounter any further issues in the future, please feel free to reach out to me for assistance.
ok,Good luck with your studies :)
I used the following instructions to configure the linux environment:
Then go to the appropriate path and execute the following code:
The following error information is displayed: