Closed visuallization closed 1 year ago
Hi, it does support CUDA out of the box, and it's actually a requirement. Training may be faster on the python module for Godot 4 though I haven't checked. I'm currently backporting changes there while conserving the export capability
Yes cuda should work. Have you tried the speedup option? There may be differences in the training parameters on the python side. Do you have the config files for this comparison available? Does ml-agents include the decision steps in the step calculation?
Hey thanks for the speed_up hint! I set the speed_up to 8 and now it seems to be quite a bit faster, but still 2 times slower than unity ml agents (but already better than 3.5 times slower!!). Do you knwo some options which I can still tweak?
For comparison, this is the godot config:
algorithm: PPO
stop:
episode_reward_mean: 100
training_iteration: 1000
timesteps_total: 5000000
config:
env: godot
env_config:
framerate: null
action_repeat: null
show_window: false
seed: 0
framework: torch
lambda: 0.95
gamma: 0.99
vf_clip_param: 100.0
clip_param: 0.2
entropy_coeff: 0.005
entropy_coeff_schedule: null
train_batch_size: 2048
sgd_minibatch_size: 128
num_sgd_iter: 16
num_workers: 4
lr: 0.0003
num_envs_per_worker: 16
batch_mode: truncate_episodes
rollout_fragment_length: 32
num_gpus: 1
model:
fcnet_hiddens: [256, 256]
framestack: 4
no_done_at_end: true
soft_horizon: true
And this is the unity config:
behaviors:
SimpleCollector:
trainer_type: ppo
hyperparameters:
batch_size: 128
buffer_size: 2048
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: false
hidden_units: 256
num_layers: 2
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 5000000
time_horizon: 128
summary_freq: 20000
threaded: true
Any help much appreciated! :)
You can increase the num_workers in rllib to be higher, if you have the CPUs available. Also, are rollout_fragment_length and time_horizon the same hyperparameter? If so, you may want to increase the fragment length to 128?
You may also want to lower the num_sgd_iter to 3, I think this is the same as num_epoch. But I am not sure. At least by changing this you can confirm if the bottleneck is the env or the training algorithm.
If you want fast training and are using linux, I highly recommend using sample-factory, it is much faster than rllib.
@edbeeching thanks so much for your input! I updated the godot config according to your suggestions and now the training is 6 times faster than initially (6M steps/h) and even almost 2 times faster than the unity ml agent env I was comparing it with (I have to rerun that though as I was doing unity training in the unity editor and godot training with an exe of the env, but still really cool! :)).
here is the updated godot rl config, for reference:
algorithm: PPO
stop:
episode_reward_mean: 100
training_iteration: 1000
timesteps_total: 5000000
config:
env: godot
env_config:
framerate: null
action_repeat: null
show_window: false
seed: 0
framework: torch
lambda: 0.95
gamma: 0.99
vf_clip_param: 100.0
clip_param: 0.2
entropy_coeff: 0.005
entropy_coeff_schedule: null
train_batch_size: 2048
sgd_minibatch_size: 128
num_sgd_iter: 3
num_workers: 8
lr: 0.0003
num_envs_per_worker: 16
batch_mode: truncate_episodes
rollout_fragment_length: 128
num_gpus: 1
model:
fcnet_hiddens: [256, 256]
framestack: 4
no_done_at_end: true
soft_horizon: true
That is great news, closing.
Hey there,
Does the rl-agent version for godot 3.5 support cuda out of the box? It seems to be a lot slower than unity ml agents (unity ml-agent being 3.5 times faster during training, in a similiar training environment: godot 1M steps/h, unity 3.5M steps/h) and I wonder if this is related to cuda support?
Kind Regards