Farama-Foundation / PettingZoo

An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
https://pettingzoo.farama.org
Other
2.46k stars 400 forks source link

Pistonball example in rllib tutorial got Nan in PPO #637

Closed hellohawaii2 closed 2 years ago

hellohawaii2 commented 2 years ago

Hi, I am running the rllib pistonball tutorial, but failed because of numerial Nan error.

This my error.txt, It seems that the model output Nan as the mean of the action distribution.

I check the progress.csv but do not find any Nan or Inf value. I also tried to set 'grad_clip' option to 0.1 instead of None, but still met the same error. Is this a hyperparameter problem and how can I solve this?

I am using Python3.7, Pytorch 1.10.1 with cuda 10.2, ray 1.9.2 and pettingzoo 1.15.0. I also tried upgrading ray to 1.10.0, but mada no difference.

Thanks for your attention!

hellohawaii2 commented 2 years ago

I tried to use the 13lines example using stable_baselines, and it works fine.

I tried to imitate the PPO config in the 13lines example and change the config of the rllib tutorial code. I tried the following config parameter but do not wor either.

PPO_config = {
        # Environment specific
        "env": env_name,
        # General
        "log_level": "WARNING",
        "framework": "torch",
        "num_gpus": 1,
        "num_workers": 8,
        "num_envs_per_worker": 2,
        "compress_observations": False,
        "batch_mode": 'truncate_episodes',

        # 'use_critic': True,
        'use_gae': True,
        "lambda": 0.99,

        "gamma": .95,

        # "kl_coeff": 0.001,
        # "kl_target": 1000.,
        "clip_param": 0.3,
        'grad_clip': 0.9,
        "entropy_coeff": 0.0905168,
        'vf_loss_coeff': 0.042202,  # this term is quit different

        "sgd_minibatch_size": 256,  # PERHAPS this is batch_size of PPO
        "num_sgd_iter": 10, # epoc
        'rollout_fragment_length': 32,
        "train_batch_size": 256,   # or is this batch_size of PPO
        'lr': 0.00062211,
        "clip_actions": True,

        # Method specific
        "multiagent": {
            "policies": policies,
            "policy_mapping_fn": (
                lambda agent_id: policy_ids[0]),
        },
    }

I tried to render the intermediate checkpoint using the provided script, and find that the model produce action of very large value. The rendering script produce many warnings as follows:

......
[WARNING]: Received an action [689.4896] that was outside action space Box([-1.], [1.], (1,), float32). Environment is clipping to space
[WARNING]: Received an action [-720.14795] that was outside action space Box([-1.], [1.], (1,), float32). Environment is clipping to space
[WARNING]: Received an action [901.5191] that was outside action space Box([-1.], [1.], (1,), float32). Environment is clipping to space
[WARNING]: Received an action [304.1859] that was outside action space Box([-1.], [1.], (1,), float32). Environment is clipping to space
[WARNING]: Received an action [-849.73065] that was outside action space Box([-1.], [1.], (1,), float32). Environment is clipping to space
[WARNING]: Received an action [394.2745] that was outside action space Box([-1.], [1.], (1,), float32). Environment is clipping to space
[WARNING]: Received an action [-2589.4639] that was outside action space Box([-1.], [1.], (1,), float32). Environment is clipping to space
[WARNING]: Received an action [353.90472] that was outside action space Box([-1.], [1.], (1,), float32). Environment is clipping to space
[WARNING]: Received an action [-2754.658] that was outside action space Box([-1.], [1.], (1,), float32). Environment is clipping to space
[WARNING]: Received an action [128.8433] that was outside action space Box([-1.], [1.], (1,), float32). Environment is clipping to space
......

At the same time, I inspect the action distribution produced by the stable_baseline PPO on some observation. And got a normal distribution with "mean -3.34, the standard deviation(scale) 47.9". This also produces actions out of range[-1,1], but is relative smaller compared with the results the rllib version produced.

Thank you very much for your attention!

hellohawaii2 commented 2 years ago

In addition.

I am using Ubuntu18.04 with TITAN X(PASCAL) GPU.

I also tried to set "clip_actions" to False and set 'normalize_actions' to True instead, however this does not work for me, either. I got Nan after 500 iterations.

jkterry1 commented 2 years ago

Hey, this is due to instability in learning code, this happens fairly often in reinforcement learning. It usually means your learning code has a bug or you have bad hyperparameters (the second option is far more likely in RLlib). There's wrappers in supersuit that will at least disable the error by turning NaNs to useful values.

hellohawaii2 commented 2 years ago

Hey, this is due to instability in learning code, this happens fairly often in reinforcement learning. It usually means your learning code has a bug or you have bad hyperparameters (the second option is far more likely in RLlib). There's wrappers in supersuit that will at least disable the error by turning NaNs to useful values.

Hi, Thanks for your response! I tried both the hyperparamters given in the example and the hyperparameters provided in the stable_baselines3, but neither worked with RLlib. May I ask whether this example still work with the newest RLlib?

jkterry1 commented 2 years ago

Hey, does the code in this not work for you? https://towardsdatascience.com/multi-agent-deep-reinforcement-learning-in-15-lines-of-code-using-pettingzoo-e0b963c0820b If so I'll take to take a look

jkterry1 commented 2 years ago

Hey, I'm going to close this due to inactivity, please let us know if you need anything else from us

acmoral commented 11 months ago

Hey, I'm going to close this due to inactivity, please let us know if you need anything else from us

Î'm having this exact same issue, after some training iterations actions taken are only the bounds of the continuous action space ( i.e 1 or 0 ) and in some later time I get the same error posted here, an array of Nans. It seems odd given that I also use the same hyperparameters in rllib

zzhou292 commented 7 months ago

I am running into the same issue, nan gets propagated to the action and training stopped

elliottower commented 7 months ago

Our RLlib pistonball tutorial is fairly old and in need of an update, I don’t have the time or expertise but if anyone here is able to get it working definitely post the solution here.

I would recommend using RLlibs official tutorials. I have a list of official tutorials using pettingzoo here: https://pettingzoo.farama.org/main/tutorials/rllib/#examples-using-pettingzoo

ElenaZamaraeva commented 3 months ago

Hey, I'm going to close this due to inactivity, please let us know if you need anything else from us

Î'm having this exact same issue, after some training iterations actions taken are only the bounds of the continuous action space ( i.e 1 or 0 ) and in some later time I get the same error posted here, an array of Nans. It seems odd given that I also use the same hyperparameters in rllib

Hi acmoral, have you managed to solve the problem? I have exactly the same problem with continuous action space as you're.