Open Ivan-267 opened 5 months ago
Updated to the latest experimental multiagent plugin (so should not be merged before), and onnx re-trained using Rllib.
Tensorboard stats (smoothing: 0)
Hyperparams used (stopped manually with CTRL + C
):
algorithm: PPO
# Multi-agent-env setting:
# If true:
# - Any AIController with done = true will receive zeroes as action values until all AIControllers are done, an episode ends at that point.
# - ai_controller.needs_reset will also be set to true every time a new episode begins (but you can ignore it in your env if needed).
# If false:
# - AIControllers auto-reset in Godot and will receive actions after setting done = true.
# - Each AIController has its own episodes that can end/reset at any point.
# Set to false if you have a single policy name for all agents set in AIControllers
env_is_multiagent: false
checkpoint_frequency: 20
# You can set one or more stopping criteria
stop:
#episode_reward_mean: 0
#training_iteration: 1000
#timesteps_total: 10000
time_total_s: 10000000
config:
env: godot
env_config:
env_path: 'JumperHard.console.exe' # Set your env path here (exported executable from Godot) - e.g. 'env_path.exe' on Windows
action_repeat: null # Doesn't need to be set here, you can set this in sync node in Godot editor as well
show_window: true # Displays game window while training. Might be faster when false in some cases, turning off also reduces GPU usage if you don't need rendering.
speedup: 30 # Speeds up Godot physics
framework: torch # ONNX models exported with torch are compatible with the current Godot RL Agents Plugin
lr: 0.0003
lambda: 0.95
gamma: 0.99
vf_loss_coeff: 0.5
vf_clip_param: .inf
#clip_param: 0.2
entropy_coeff: 0.0001
entropy_coeff_schedule: null
#grad_clip: 0.5
normalize_actions: False
clip_actions: True # During onnx inference we simply clip the actions to [-1.0, 1.0] range, set here to match
rollout_fragment_length: 32
sgd_minibatch_size: 128
num_workers: 4
num_envs_per_worker: 16
train_batch_size: 2048
num_sgd_iter: 4
batch_mode: truncate_episodes
num_gpus: 0
model:
vf_share_layers: False
fcnet_hiddens: [64, 64]
Smaller discrete actions training session (just for testing discrete actions, onnx not included): Relevant env code changes (AIController3D.gd):
func set_action(action):
_player.move_action = action.move - 1
_player.turn_action = action.turn - 1
_player.jump_action = action.jump
func get_action_space():
return {
"jump": {"size": 2, "action_type": "discrete"},
"move": {"size": 3, "action_type": "discrete"},
"turn": {"size": 3, "action_type": "discrete"}
}
Results:
I was able to train this with relatively good results using rllib, but I ~don't have the trained onnx yet~ (added, see below) as I found out that the rllib script exports a different output shape than our sb3 export.