Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
16.93k stars 4.13k forks source link

Strange behaviour for SAC and GAIL #5613

Closed MrOCW closed 2 years ago

MrOCW commented 2 years ago

Hi,

I'm trying to train a car to drive in lane using GAIL but it seems that GAIL is disabled (?) after awhile. image image

Also, for SAC, the environment freezes for a long time (presumably for updating the network) and many summary steps get shown in batches image If i increase the batch size from 256 to 512, the environment freezes for as long as 5min

Environment (please complete the following information):

andrewcoh commented 2 years ago

Hi @MrOCW

Can you share your entire yaml file? It looks like the discriminator in GAIL is breaking but it's not clear why just from these curves. I suspect a NaN or something like that because GAIL appears to be working properly until that point around ~500k timesteps.

Are you seeing other large time intervals between summaries or is the screenshot the only time this occurs? Does this screenshot coincide with the degradation of GAIL?

Can you share the other TB curves e.g. policy entropy? Are you seeing any NaNs either in C# or python?

MrOCW commented 2 years ago

@andrewcoh YAML

trainer_type:   sac
hyperparameters:    
  learning_rate:    0.0003
  learning_rate_schedule:   constant
  batch_size:   256
  buffer_size:  58000
  buffer_init_steps:    10000
  tau:  0.005
  steps_per_update: 4.0
  save_replay_buffer:   False
  init_entcoef: 0.5
  reward_signal_steps_per_update:   10.0
network_settings:   
  normalize:    False
  hidden_units: 256
  num_layers:   1
  vis_encode_type:  simple
  memory:   None
  goal_conditioning_type:   none
reward_signals: 
  extrinsic:    
    gamma:  0.99
    strength:   1.0
    network_settings:   
      normalize:    False
      hidden_units: 128
      num_layers:   2
      vis_encode_type:  simple
      memory:   None
      goal_conditioning_type:   hyper
  gail: 
    gamma:  0.99
    strength:   0.5
    network_settings:   
      normalize:    False
      hidden_units: 128
      num_layers:   2
      vis_encode_type:  simple
      memory:   None
      goal_conditioning_type:   none
    learning_rate:  0.0003
    encoding_size:  None
    use_actions:    True
    use_vail:   False
    demo_path:  ..................../Assets/Demonstrations
init_path:  None
keep_checkpoints:   5
checkpoint_interval:    200000
max_steps:  3000000
time_horizon:   64
summary_freq:   500
threaded:   True
self_play:  None
behavioral_cloning: None

For the time intervals, yes, it happens throughout training. To elaborate, printing of summary steps happens in batches after large time intervals. It does not seem to have any relation with the GAIL issue

image image image

Have not seen any NaNs anywhere.

MrOCW commented 2 years ago

@andrewcoh any updates on this issue?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had activity in the last 42 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.