edbeeching / godot_rl_agents

An Open Source package that allows video game creators, AI researchers and hobbyists the opportunity to learn complex behaviors for their Non Player Characters or agents
MIT License
942 stars 69 forks source link

RL Agents Godot3.5 cuda support #62

Closed visuallization closed 1 year ago

visuallization commented 1 year ago

Hey there,

Does the rl-agent version for godot 3.5 support cuda out of the box? It seems to be a lot slower than unity ml agents (unity ml-agent being 3.5 times faster during training, in a similiar training environment: godot 1M steps/h, unity 3.5M steps/h) and I wonder if this is related to cuda support?

Kind Regards

yaelatletl commented 1 year ago

Hi, it does support CUDA out of the box, and it's actually a requirement. Training may be faster on the python module for Godot 4 though I haven't checked. I'm currently backporting changes there while conserving the export capability

edbeeching commented 1 year ago

Yes cuda should work. Have you tried the speedup option? There may be differences in the training parameters on the python side. Do you have the config files for this comparison available? Does ml-agents include the decision steps in the step calculation?

visuallization commented 1 year ago

Hey thanks for the speed_up hint! I set the speed_up to 8 and now it seems to be quite a bit faster, but still 2 times slower than unity ml agents (but already better than 3.5 times slower!!). Do you knwo some options which I can still tweak?

For comparison, this is the godot config:

algorithm: PPO

stop:
    episode_reward_mean: 100
    training_iteration: 1000
    timesteps_total: 5000000

config:
    env: godot
    env_config:
        framerate: null
        action_repeat: null
        show_window: false
        seed: 0
    framework: torch  
    lambda: 0.95
    gamma: 0.99

    vf_clip_param: 100.0
    clip_param: 0.2
    entropy_coeff: 0.005
    entropy_coeff_schedule: null
    train_batch_size: 2048
    sgd_minibatch_size: 128
    num_sgd_iter: 16
    num_workers: 4
    lr: 0.0003
    num_envs_per_worker: 16
    batch_mode: truncate_episodes
    rollout_fragment_length: 32
    num_gpus: 1
    model:
        fcnet_hiddens: [256, 256] 
        framestack: 4
    no_done_at_end: true
    soft_horizon: true

And this is the unity config:

behaviors:
    SimpleCollector:
      trainer_type: ppo
      hyperparameters:
        batch_size: 128
        buffer_size: 2048
        learning_rate: 0.0003
        beta: 0.005
        epsilon: 0.2
        lambd: 0.95
        num_epoch: 3
        learning_rate_schedule: linear
      network_settings:
        normalize: false
        hidden_units: 256
        num_layers: 2
        vis_encode_type: simple
      reward_signals:
        extrinsic:
          gamma: 0.99
          strength: 1.0
      keep_checkpoints: 5
      max_steps: 5000000
      time_horizon: 128
      summary_freq: 20000
      threaded: true

Any help much appreciated! :)

edbeeching commented 1 year ago

You can increase the num_workers in rllib to be higher, if you have the CPUs available. Also, are rollout_fragment_length and time_horizon the same hyperparameter? If so, you may want to increase the fragment length to 128?

You may also want to lower the num_sgd_iter to 3, I think this is the same as num_epoch. But I am not sure. At least by changing this you can confirm if the bottleneck is the env or the training algorithm.

If you want fast training and are using linux, I highly recommend using sample-factory, it is much faster than rllib.

visuallization commented 1 year ago

@edbeeching thanks so much for your input! I updated the godot config according to your suggestions and now the training is 6 times faster than initially (6M steps/h) and even almost 2 times faster than the unity ml agent env I was comparing it with (I have to rerun that though as I was doing unity training in the unity editor and godot training with an exe of the env, but still really cool! :)).

here is the updated godot rl config, for reference:

algorithm: PPO

stop:
    episode_reward_mean: 100
    training_iteration: 1000
    timesteps_total: 5000000

config:
    env: godot
    env_config:
        framerate: null
        action_repeat: null
        show_window: false
        seed: 0
    framework: torch  
    lambda: 0.95
    gamma: 0.99

    vf_clip_param: 100.0
    clip_param: 0.2
    entropy_coeff: 0.005
    entropy_coeff_schedule: null
    train_batch_size: 2048
    sgd_minibatch_size: 128
    num_sgd_iter: 3
    num_workers: 8
    lr: 0.0003
    num_envs_per_worker: 16
    batch_mode: truncate_episodes
    rollout_fragment_length: 128
    num_gpus: 1
    model:
        fcnet_hiddens: [256, 256] 
        framestack: 4
    no_done_at_end: true
    soft_horizon: true
edbeeching commented 1 year ago

That is great news, closing.