edbeeching / godot_rl_agents

An Open Source package that allows video game creators, AI researchers and hobbyists the opportunity to learn complex behaviors for their Non Player Characters or agents
MIT License
942 stars 69 forks source link

Changes to enable discrete actions > 2 #121

Closed Ivan-267 closed 5 months ago

Ivan-267 commented 1 year ago

This is my attempt to get discrete only actions to work properly on onnx exported models with SB3 and also discrete actions > 2 to work with SB3. As I'm not very familiar with the code, libraries used and ML/RL, I will describe the approach and changes below, however in case of merging a review is requested. I'm marking this as draft as well, so any potential changes can be made before merging. I've only tested this with Stable Baselines and CleanRL example so far.

Update: After a quick test with rllib (using the yaml file from this repo and small file changes to make training and export work for me) with discrete actions only used in the env, the onnx model outputted floats, which wouldn't work with this update (to make it compatible with the update, the onnx model would need to output integers for discrete actions in discrete-only action environments). It's also not likely to work well with the current version as the Godot plugin does not handle logits for discrete-only action spaces (adding the conversion there instead of integrating it into the models may be a potential alternative approach).

This change is connected to the draft PR for plugin (https://github.com/edbeeching/godot_rl_agents_plugin/pull/16/files#) - changes to both repositories are necessary.

Changes to the main repository:

Main changes to godot_rl/core/utils.py: When training using SB3 on environments with only discrete actions, currently SB3 will output integer values for an action (e.g. 0,1,2,3,4), when using CleanRL example, action values will always be continuous/float values, and when using mixed spaces with SB3 the values will be continuous/float also.

Main changes to godot_rl/wrappers/onnx/stable_baselines_export.py Currently, when exporting models that have only discrete actions using the sb3 example, the .onnx model will produce logits for the discrete actions, while sb3 model produces integers for each discrete action. This produces an error on the verify_onnx_export assertion.

With that change, if the exported model uses discrete actions only (the example from the image has 2 discrete actions), the following operations are added which should select the action with the highest probability from the logits: Netron_ZXj1dVNn7V

Changes to the plugin repository: ONNXInference.cs:

Previously, the exported onnx model would output float values for actions in case of either continuous or discrete actions used. After the changes above, onnx will output discrete actions as integers if only discrete actions are used in the environment. To handle that, I've made a few changes:

sync.gd:

Simple test of output values when only discrete actions are used with the sb3 example for training after the changes (printing the received actions directly in Godot):

image

Ivan-267 commented 1 year ago

Thanks for the review. I don't have a custom environment for this as I was testing it on a modified example environment from the examples repo. Also since I had to check all 3 cases (continuous only / mixed / discrete only), one environment wouldn't cover checking everything.

However, here are the steps that should get it working quickly:

func get_action_space() -> Dictionary:
    return {
        "discrete_size_2" : {
            "size": 2,
            "action_type": "discrete"
        },
        "continuous_size_2" : {
            "size": 2,
            "action_type": "continuous"
        },
        }
func get_action_space() -> Dictionary:
    # Should throw an error in python, discrete size 2 is current 
    # maximum supported for mixed spaces
    return {
        "discrete_size_10" : {
            "size": 10,
            "action_type": "discrete"
        },
        "continuous_size_1" : {
            "size": 1,
            "action_type": "continuous"
        },
        }

You can use the same setting for testing training and inference on the exported .onnx file.

Finally, if you want to train the agent with e.g. discrete actions only to see if it learns a good policy, set_action will need to be modified accordingly based on the example, but the modification should be simple for RingPong or BallChase.

Additional notes / considerations:

Ivan-267 commented 1 year ago

I've made a small test environment which only outputs action values (includes the updated compatible plugin already). godot-rl libray still needs to be installed from the PR branch.

https://github.com/Ivan-267/godot_rl_agents_examples/tree/test_environments/test_environments/ActionSpacesTestEnvironment

I've made a separate branch in the fork for it since it's a test only environment. I'm not sure to what branch and whether we should merge it, but it might be useful to have some test environments for checking regression later on.

Usage: From Godot, open the "AiController2D" node's script and use any of the predefined (or a custom) dictionary for testing, to validate the output, check the value of actions returned in the Godot console after starting training with SB3 or other libraries (or after exporting and using an .onnx file for inference):

Godot_v4 0 3-stable_mono_win64_MeELTbpqek

In the current PR, the second dictionary mixed_types_larger_than_2 should fail with a descriptive error message in the console from which training was launched, all others should produce valid outputs.

Ivan-267 commented 1 year ago

Please hold off on the review as an important issue was spotted by @yaelatletl during his review that needs to be addressed in my experimental branch of the plugin before potential merging.

To keep things synchronized and avoid merge conflicts, once the PR https://github.com/edbeeching/godot_rl_agents_plugin/pull/18 is merged, I will sync my plugin fork and then re-implement the changes necessary to make this PR work in my experimental branch.

visuallization commented 1 year ago

@Ivan-267 What is the current state of the PR? Any chance we can clean up the mess I have made? :D

Ivan-267 commented 1 year ago

@Ivan-267 What is the current state of the PR? Any chance we can clean up the mess I have made? :D

I am still using a variant of this branch for experimenting with my discrete only scenes locally (without using onnx), but it will need to be reworked before a potential merge (specifically the plugin side, but potentially something here as well).

Since it alters the onnx model to output discrete actions directly when only discrete actions are used, separate handling is necessary for discrete and continuous actions on the plugin side, in both the C# code and the gdscript code. I made a mistake in my implementation of the plugin changes initially, and there are pending changes that are being worked on in the PR I linked above. After that PR is merged I could try to re-apply the changes that handle discrete only actions.

There are also different possible alternative methods to the one used here for certain things, e.g.:

Edit: The plugin was updated. As is, this would still not work with discrete only actions in onnx models exported using rllib (if they provide logits as output), but should work with sb3.

Ivan-267 commented 5 months ago

Superseded by https://github.com/edbeeching/godot_rl_agents/pull/181.

The branch can still be kept for reference since this implementation has an advantage, it sends only action values for discrete actions (not logits), so it could be more efficient with large spaces.