PKU-MARL / HARL

Official implementation of HARL algorithms based on PyTorch.
521 stars 64 forks source link

numpy version reported an error #12

Closed zkzfor closed 1 year ago

zkzfor commented 1 year ago

I used the following instructions to configure the linux environment:

pytorch:1.8PAI-gpu-py36-cu101-ubuntu18.04

conda create -n harl python=3.8
conda activate harl
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html

git clone https://github.com/PKU-MARL/HARL.git
cd HARL
pip install -e .

pip install pettingzoo==1.22.2
pip install supersuit==3.7.0

pip install setuptools==65.5.0

pip install gym==0.21.0
pip install pyglet==1.5.0
pip install importlib-metadata==4.13.0

Then go to the appropriate path and execute the following code:

(harl) /mnt/workspace/HARL/examples> python train.py --algo haddpg --env pettingzoo_mpe --exp_name mpe_haddpg

The following error information is displayed:

start warmup
finish warmup, start training
Env pettingzoo_mpe Task simple_spread_v2-continuous Algo haddpg Exp mpe_haddpg Evaluation at step 110000 / 10000000:
Traceback (most recent call last):
  File "train.py", line 95, in <module>
    main()
  File "train.py", line 90, in main
    runner.run()
  File "/mnt/workspace/HARL/harl/runners/off_policy_base_runner.py", line 288, in run
    self.eval(cur_step)
  File "/home/pai/envs/harl2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/workspace/HARL/harl/runners/off_policy_base_runner.py", line 531, in eval
    self.algo_args["eval"]["n_eval_rollout_threads"], dtype=np.int
  File "/home/pai/envs/harl2/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
guazimao commented 1 year ago

This happens because you have installed the latest numpy version 1.24.0 where np.int is depreciated. You can solve it by installing a lower version of numpy.

 pip install "numpy<1.24.0"
zkzfor commented 1 year ago

When I install numpy==1.19.4, this command will report an error directly When the "np.int" in line 531 of the file of "/mnt/workspace/HARL/harl/runners/off_policy_base_runner.py" is changed to "np.int64", the command can be run

guazimao commented 1 year ago

Maybe you can try installing numpy==1.21.6. It has worked for me

zkzfor commented 1 year ago

Ok, numpy==1.21.6 does work, but there are other problems. Is hasac algorithm not adapted to pettingzoo_mpe environment? I get the following error when running the command "python train.py --algo hasac --env pettingzoo_mpe --exp_name mpe_hasac" :

start warmup
finish warmup, start training
Env pettingzoo_mpe Task simple_spread_v2-continuous Algo hasac Exp mpe_hasac Evaluation at step 210000 / 20000000:
Eval average episode reward is -102.3018353956586, eval average episode length is 25.0.

Traceback (most recent call last):
  File "train.py", line 95, in <module>
    main()
  File "train.py", line 90, in main
    runner.run()
  File "/mnt/workspace/HARL/harl/runners/off_policy_base_runner.py", line 278, in run
    self.train()
  File "/mnt/workspace/HARL/harl/runners/off_policy_ha_runner.py", line 162, in train
    actions[agent_id], _ = self.actor[
  File "/mnt/workspace/HARL/harl/algorithms/actors/hasac.py", line 56, in get_actions_with_logprobs
    actions, logp_actions = self.actor(
  File "/home/pai/envs/harl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/HARL/harl/models/policy_models/squashed_gaussian_policy.py", line 63, in forward
    pi_distribution = Normal(mu, std)
  File "/home/pai/envs/harl/lib/python3.8/site-packages/torch/distributions/normal.py", line 50, in __init__
    super(Normal, self).__init__(batch_shape, validate_args=validate_args)
  File "/home/pai/envs/harl/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (1000, 5)) of distribution Normal(loc: torch.Size([1000, 5]), scale: torch.Size([1000, 5])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        ...,
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan]], device='cuda:0')

My environment is as follows:

# packages in environment at /home/pai/envs/harl:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    defaults
_openmp_mutex             5.1                       1_gnu    defaults
absl-py                   1.4.0                    pypi_0    pypi
ca-certificates           2023.05.30           h06a4308_0    defaults
cachetools                5.3.1                    pypi_0    pypi
certifi                   2023.7.22                pypi_0    pypi
cffi                      1.15.1                   pypi_0    pypi
charset-normalizer        3.2.0                    pypi_0    pypi
cloudpickle               2.2.1                    pypi_0    pypi
farama-notifications      0.0.4                    pypi_0    pypi
future                    0.18.3                   pypi_0    pypi
google-auth               2.22.0                   pypi_0    pypi
google-auth-oauthlib      1.0.0                    pypi_0    pypi
grpcio                    1.57.0                   pypi_0    pypi
gym                       0.21.0                   pypi_0    pypi
gymnasium                 0.29.1                   pypi_0    pypi
harl                      1.0.0                     dev_0    <develop>
idna                      3.4                      pypi_0    pypi
importlib-metadata        4.13.0                   pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1    defaults
libffi                    3.4.4                h6a678d5_0    defaults
libgcc-ng                 11.2.0               h1234567_1    defaults
libgomp                   11.2.0               h1234567_1    defaults
libstdcxx-ng              11.2.0               h1234567_1    defaults
markdown                  3.4.4                    pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0    defaults
numpy                     1.21.6                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
openssl                   3.0.10               h7f8727e_2    defaults
packaging                 23.1                     pypi_0    pypi
pettingzoo                1.22.2                   pypi_0    pypi
pillow                    10.0.0                   pypi_0    pypi
pip                       23.2.1           py38h06a4308_0    defaults
protobuf                  4.24.2                   pypi_0    pypi
pyasn1                    0.5.0                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pycparser                 2.21                     pypi_0    pypi
pygame                    2.1.2                    pypi_0    pypi
pyglet                    1.5.0                    pypi_0    pypi
pymunk                    6.2.1                    pypi_0    pypi
python                    3.8.17               h955ad1f_0    defaults
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h5eee18b_0    defaults
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
setproctitle              1.3.2                    pypi_0    pypi
setuptools                65.5.0                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0    defaults
supersuit                 3.7.0                    pypi_0    pypi
tensorboard               2.14.0                   pypi_0    pypi
tensorboard-data-server   0.7.1                    pypi_0    pypi
tensorboardx              2.6.2.2                  pypi_0    pypi
tinyscaler                1.2.6                    pypi_0    pypi
tk                        8.6.12               h1ccaba5_0    defaults
torch                     1.10.0+cu111             pypi_0    pypi
torchaudio                0.10.0+rocm4.1           pypi_0    pypi
torchvision               0.11.0+cu111             pypi_0    pypi
typing-extensions         4.7.1                    pypi_0    pypi
urllib3                   1.26.16                  pypi_0    pypi
werkzeug                  2.3.7                    pypi_0    pypi
wheel                     0.38.4           py38h06a4308_0    defaults
xz                        5.4.2                h5eee18b_0    defaults
zipp                      3.16.2                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0    defaults
guazimao commented 1 year ago

This error occurs due to your chosen hyperparameters. A great example of this. Could you provide the hyperparameters used? Also, you can use the following hyperparameters, which work fine for me.

{
    "algo_args":    {
        "algo": {
            "alpha":    0.2,
            "alpha_lr": 0.0003,
            "auto_alpha":   true,
            "batch_size":   1000,
            "buffer_size":  1000000,
            "fixed_order":  false,
            "gamma":    0.99,
            "huber_delta":  10.0,
            "n_step":   20,
            "polyak":   0.005,
            "share_param":  false,
            "use_huber_loss":   false,
            "use_policy_active_masks":  true
        },
        "device":   {
            "cuda": true,
            "cuda_deterministic":   true,
            "torch_threads":    4
        },
        "eval": {
            "eval_episodes":    40,
            "n_eval_rollout_threads":   20,
            "use_eval": true
        },
        "logger":   {
            "log_dir":  "./results"
        },
        "model":    {
            "activation_func":  "relu",
            "critic_lr":    0.0005,
            "final_activation_func":    "tanh",
            "gain": 0.01,
            "hidden_sizes": [
                256,
                256
            ],
            "initialization_method":    "orthogonal_",
            "lr":   0.0005,
            "use_feature_normalization":    true
        },
        "render":   {
            "render_episodes":  10,
            "use_render":   false
        },
        "seed": {
            "seed": 3,
            "seed_specify": true
        },
        "train":    {
            "eval_interval":    10000,
            "log_interval": null,
            "model_dir":    null,
            "n_rollout_threads":    20,
            "num_env_steps":    20000000,
            "train_interval":   50,
            "update_per_train": 1,
            "use_linear_lr_decay":  false,
            "use_proper_time_limits":   true,
            "use_valuenorm":    false,
            "warmup_steps": 10000
        }
    },
    "env_args": {
        "continuous_actions":   true,
        "scenario": "simple_spread_v2"
    },
    "main_args":    {
        "algo": "hasac",
        "env":  "pettingzoo_mpe",
        "exp_name": "test",
        "load_config":  ""
    }
}
zkzfor commented 1 year ago

Thank you very much for your guidance, now the algorithm works normally. My previous configuration is as follows:

{
    "algo_args":    {
        "algo": {
            "alpha":    0.001,
            "alpha_lr": 0.0003,
            "auto_alpha":   false,
            "batch_size":   1000,
            "buffer_size":  1000000,
            "fixed_order":  false,
            "gamma":    0.99,
            "huber_delta":  10.0,
            "n_step":   20,
            "polyak":   0.005,
            "share_param":  false,
            "use_huber_loss":   true,
            "use_policy_active_masks":  true
        },
        "device":   {
            "cuda": true,
            "cuda_deterministic":   true,
            "torch_threads":    4
        },
        "eval": {
            "eval_episodes":    40,
            "n_eval_rollout_threads":   20,
            "use_eval": true
        },
        "logger":   {
            "log_dir":  "./results"
        },
        "model":    {
            "activation_func":  "relu",
            "critic_lr":    0.0005,
            "final_activation_func":    "tanh",
            "gain": 0.01,
            "hidden_sizes": [
                256,
                256
            ],
            "initialization_method":    "orthogonal_",
            "lr":   0.0005,
            "use_feature_normalization":    true
        },
        "render":   {
            "render_episodes":  10,
            "use_render":   false
        },
        "seed": {
            "seed": 1,
            "seed_specify": true
        },
        "train":    {
            "eval_interval":    10000,
            "log_interval": null,
            "model_dir":    null,
            "n_rollout_threads":    20,
            "num_env_steps":    20000000,
            "train_interval":   50,
            "update_per_train": 1,
            "use_linear_lr_decay":  false,
            "use_proper_time_limits":   true,
            "use_valuenorm":    true,
            "warmup_steps": 10000
        }
    },
    "env_args": {
        "continuous_actions":   true,
        "scenario": "simple_spread_v2"
    },
    "main_args":    {
        "algo": "hasac",
        "env":  "pettingzoo_mpe",
        "exp_name": "mpe_hasac",
        "load_config":  ""
    }
}

The above hyperparameters are not manually set by me, but generated using the following command:

python train.py --algo hasac --env pettingzoo_mpe --exp_name mpe_hasac
guazimao commented 1 year ago

You're welcome. The reason for the NaNs error you encountered appears to be due to setting the huber_loss and valuenorm to true. In continuous environments, I usually set both of these hyperparameters to false.

zkzfor commented 1 year ago

Thanks you.The type of algorithm you've implemented is a heterogeneous algorithm. If some agents use continuous action spaces while others use discrete action spaces, how should this be configured? I've found through testing that your HASAC algorithm trains very stably, which is excellent. It saves a lot of tuning effort for someone like me. I'd like to apply it to the problem of integrated communication resource allocation in a unified way for agents using discrete and continuous spaces. I'm not sure if this is an application scenario for your heterogeneous algorithm and what configuration aspects need to be considered when using it. :)

guazimao commented 1 year ago

Thank you very much for your interest in applying our algorithm to other scenarios. I‘m sorry we haven't tested our algorithm in environments with mixed discrete and continuous action spaces yet, due to the limited availability of such benchmarks. Nonetheless, I believe it can still be applicable in such settings because I took that into consideration during implementation. As for hyperparameter settings, apart from the two hyperparameters I suggested setting to false earlier, the primary parameter that requires tuning is alpha. Based on my past experience, setting alpha to be automatically-tuned usually yields good results.

If you encounter any further issues in the future, please feel free to reach out to me for assistance.

zkzfor commented 1 year ago

ok,Good luck with your studies :)