[!IMPORTANT] This uses a mixture of plugin updates not currently in main branch including: https://github.com/edbeeching/godot_rl_agents_plugin/pull/37, https://github.com/edbeeching/godot_rl_agents_plugin/pull/40, and following modifications:

Removed the negative step reward (it seems to be learning much more quickly without it, might be related to the relatively large magnitude of the reward)
Changed formatting to channel last for Rllib (I think sb3 can work with both, but would need larger image size):
Changed image size to 10x10 as it's one of supported CNN formats for Rllib and is more lightweight

Manually changed which obs to read from the dict during inference in Sync node code, set as:

        var action = model.run_inference(
            obs[agent_id]["camera_2d"], 1.0
        )

Added a training mode inspector property to RGBCameraSensor3D, if true, it sends hex encoded image data, otherwise it sends image data without hex encoding (needed for inference). Currently this needs to be set to False in Player.tscn when running onnx inference, and true when training. A better solution is needed in the future for these manual changes, but it's a quick way to get things working for this example.
Added the trained onnx with training for just 68 seconds with a single env on my PC then manually stopped, using the following Rllib example config (from the multiagent Godot RL branch):
```
algorithm: PPO
```

Multi-agent-env setting:

If true:

- Any AIController with done = true will receive zeroes as action values until all AIControllers are done, an episode ends at that point.

- ai_controller.needs_reset will also be set to true every time a new episode begins (but you can ignore it in your env if needed).

If false:

- AIControllers auto-reset in Godot and will receive actions after setting done = true.

- Each AIController has its own episodes that can end/reset at any point.

Set to false if you have a single policy name for all agents set in AIControllers

env_is_multiagent: false

checkpoint_frequency: 30

You can set one or more stopping criteria

stop:

episode_reward_mean: 0

#training_iteration: 1000
#timesteps_total: 10000
time_total_s: 10000000

config: env: godot env_config: env_path: "virtualcamera.console.exe" # Set your env path here (exported executable from Godot) - e.g. 'env_path.exe' on Windows action_repeat: null # Doesn't need to be set here, you can set this in sync node in Godot editor as well show_window: true # Displays game window while training. Might be faster when false in some cases, turning off also reduces GPU usage if you don't need rendering. speedup: 30 # Speeds up Godot physics

framework: torch # ONNX models exported with torch are compatible with the current Godot RL Agents Plugin

lr: 0.0003

lambda: 0.95

#gamma: 0.99

#vf_loss_coeff: 0.5
vf_clip_param: .inf
#clip_param: 0.2
entropy_coeff: 0.0001
entropy_coeff_schedule: null
#grad_clip: 0.5

normalize_actions: False
clip_actions: True # During onnx inference we simply clip the actions to [-1.0, 1.0] range, set here to match

rollout_fragment_length: 32
sgd_minibatch_size: 64
num_workers: 1
num_envs_per_worker: 1 # This will be set automatically if not multi-agent. If multi-agent, changing this changes how many envs to launch per worker.
# The value below needs changing per env
train_batch_size: 512 # Basic calculation for this value can be rollout_fragment_length * num_workers * num_envs_per_worker (how many AIControllers you have if not multi_agent, otherwise the value you set)

num_sgd_iter: 4
batch_mode: truncate_episodes

num_gpus: 0
model:
    vf_share_layers: False
    fcnet_hiddens: [64, 64]



Onnx inference test video:

https://github.com/edbeeching/godot_rl_agents_examples/assets/61947090/16750152-048b-4de5-b5b3-2933afc58258

edbeeching / godot_rl_agents_examples

Virtual Camera update: Rllib and onnx support, added onnx #32