Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.17k stars 4.16k forks source link

Continuous action - input is always zero while training #5505

Closed Nirav-Madhani closed 3 years ago

Nirav-Madhani commented 3 years ago

Describe the bug I am using ml agents mlagents==0.27.0 , The continuous action input is always 0 while training.

To Reproduce Use Below Configuration and try training any model

absl-py==0.13.0
attrs==21.2.0
cachetools==4.2.2
cattrs==1.5.0
certifi==2021.5.30
chardet==4.0.0
cloudpickle==1.6.0
google-auth==1.32.0
google-auth-oauthlib==0.4.4
grpcio==1.38.1
h5py==3.2.1
idna==2.10
Markdown==3.3.4
mlagents==0.27.0
mlagents-envs==0.27.0
numpy==1.20.3
oauthlib==3.1.1
Pillow==8.2.0
protobuf==3.17.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
pypiwin32==223
pywin32==301
PyYAML==5.4.1
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.7.2
six==1.16.0
tensorboard==2.5.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
torch==1.9.0
torchvision==0.10.0
typing-extensions==3.10.0.0
urllib3==1.26.5
Werkzeug==2.0.1

Console logs / stack traces Not Applicable

Screenshots Not Applicable

Environment (please complete the following information):

andrewcoh commented 3 years ago

Hi @Nirav-Madhani

Can you verify that this does not happen on one of our example environments or have you verified that it does happen? Are you seeing NaNs in the tensorboards?

Nirav-Madhani commented 3 years ago

It does work on example environments.

For my environment, I checked training_status.json , timers.json and tensorboard as well. No NaN anywhere.

BTW here is my config file if it can help identify any peoblem

behaviors:
  TRC:
    trainer_type: ppo
    hyperparameters:
      batch_size: 256
      buffer_size: 2560
      learning_rate: 0.003
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: false
      hidden_units: 128
      num_layers: 2
      vis_encode_type: resnet
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
      curiosity:
        gamma: 0.99
        strength: 0.02
        network_settings:
          hidden_units: 256
        learning_rate: 0.0003
      gail:
        strength: 0.8
        demo_path: Project\Assets\T-Demos\D1.demo
    keep_checkpoints: 5
    max_steps: 400000
    time_horizon: 64
    behavioral_cloning:
      demo_path: Project\Assets\T-Demos\D1.demo
      strength: 0.7
    summary_freq: 100

The command I used:

mlagents-learn config\ppo\TRC.yaml --run-id=TRC --resume  --time-scale 5

I have 4 camera input and 1 int variable as observation.

andrewcoh commented 3 years ago

Is the behavior type in the Behavior Parameters script set to Default? Does your agent have a Decision Requester?

Nirav-Madhani commented 3 years ago

Is the behavior type in the Behavior Parameters script set to Default?

Yes.

Does your agent have a Decision Requester?

No. But I am manually calling RequestDecision() Function when required.

While Training for about 9K steps, it did once or twice sent 1 and -1 as inputs. But the problem still persists, why only 0 most of the time and 1 and -1 rarely? Why no other big value like 25 or - 39?

I don't mind sharing scripts and scenes, if required please let me know.

Nirav-Madhani commented 3 years ago

I reviewed code and found out the reason. I am converting input to int value, on removing which i am getting various floating point inputs.

But one final problem that remains is why are input values so small, are they meant to range between -1 and 1. I did check other issues and discussions , specifically https://github.com/Unity-Technologies/ml-agents/issues/319 where the author is getting values like 520,630,-580.

Anyway, since this is not the main topic of issue, I am closing it

If you can help regarding getting big inputs directly rather than mathematically clamping them then it would be great!

Anyway, Thank You So Much!

andrewcoh commented 3 years ago

It is intentional that our policies output actions in the range of -1, 1. The reason for this is keep the scale of the network weights small in order to make training more robust/stable. To get larger values, I recommend mapping the action from -1 ,1 to your desired range in C# through multiplication/addition.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.