Closed Ivan-267 closed 5 months ago
Thanks for the review. I don't have a custom environment for this as I was testing it on a modified example environment from the examples repo. Also since I had to check all 3 cases (continuous only / mixed / discrete only), one environment wouldn't cover checking everything.
However, here are the steps that should get it working quickly:
func get_action_space() -> Dictionary:
return {
"discrete_size_10" : {
"size": 10,
"action_type": "discrete"
},
"discrete_size_5" : {
"size": 5,
"action_type": "discrete"
},
}
func get_action_space() -> Dictionary:
return {
"discrete_size_2" : {
"size": 2,
"action_type": "discrete"
},
"continuous_size_2" : {
"size": 2,
"action_type": "continuous"
},
}
func get_action_space() -> Dictionary:
# Should throw an error in python, discrete size 2 is current
# maximum supported for mixed spaces
return {
"discrete_size_10" : {
"size": 10,
"action_type": "discrete"
},
"continuous_size_1" : {
"size": 1,
"action_type": "continuous"
},
}
func set_action(action) -> void:
print(action)
You can use the same setting for testing training and inference on the exported .onnx file.
Finally, if you want to train the agent with e.g. discrete actions only to see if it learns a good policy, set_action will need to be modified accordingly based on the example, but the modification should be simple for RingPong or BallChase.
Additional notes / considerations:
If you try to export to onnx with discrete actions only in the current main branch, you will receive an error for action mismatch as the onnx file will produce logits and the sb3 model will produce a single integer value for the discrete action. With the PR the output should match.
Optionally, if you have e.g. Netron you can also inspect the sb3 exported onnx file to see the split/softmax/argmax operations added to the output in case of discrete actions only in the environment, and not added in case of mixed or continuous actions, which helps to verify what kind of output is coming from the onnx model (my example image in the PR is from an onnx model with 2 discrete actions). The exported onnx model from rllib did not have these added operations, so we may need to adjust it in the future (I'm not yet sure of the best approach as I didn't try rllib before).
In case of discrete actions they will be selected deterministically (using argmax), and also continuous action values are deterministic. Further modification may be needed if we want to add non-deterministic actions on inference in the future (may help prevent the agent from getting stuck in some cases, but I'm not sure about the best approach yet if this is a needed feature).
If we want to have mixed actions where discrete actions are larger than 2, a different approach might be needed (I don't currently know the best way to approach that case).
I've made a small test environment which only outputs action values (includes the updated compatible plugin already). godot-rl libray still needs to be installed from the PR branch.
I've made a separate branch in the fork for it since it's a test only environment. I'm not sure to what branch and whether we should merge it, but it might be useful to have some test environments for checking regression later on.
Usage: From Godot, open the "AiController2D" node's script and use any of the predefined (or a custom) dictionary for testing, to validate the output, check the value of actions returned in the Godot console after starting training with SB3 or other libraries (or after exporting and using an .onnx file for inference):
In the current PR, the second dictionary mixed_types_larger_than_2
should fail with a descriptive error message in the console from which training was launched, all others should produce valid outputs.
Please hold off on the review as an important issue was spotted by @yaelatletl during his review that needs to be addressed in my experimental branch of the plugin before potential merging.
To keep things synchronized and avoid merge conflicts, once the PR https://github.com/edbeeching/godot_rl_agents_plugin/pull/18 is merged, I will sync my plugin fork and then re-implement the changes necessary to make this PR work in my experimental branch.
@Ivan-267 What is the current state of the PR? Any chance we can clean up the mess I have made? :D
@Ivan-267 What is the current state of the PR? Any chance we can clean up the mess I have made? :D
I am still using a variant of this branch for experimenting with my discrete only scenes locally (without using onnx), but it will need to be reworked before a potential merge (specifically the plugin side, but potentially something here as well).
Since it alters the onnx model to output discrete actions directly when only discrete actions are used, separate handling is necessary for discrete and continuous actions on the plugin side, in both the C# code and the gdscript code. I made a mistake in my implementation of the plugin changes initially, and there are pending changes that are being worked on in the PR I linked above. After that PR is merged I could try to re-apply the changes that handle discrete only actions.
There are also different possible alternative methods to the one used here for certain things, e.g.:
Edit: The plugin was updated. As is, this would still not work with discrete only actions in onnx models exported using rllib (if they provide logits as output), but should work with sb3.
Superseded by https://github.com/edbeeching/godot_rl_agents/pull/181.
The branch can still be kept for reference since this implementation has an advantage, it sends only action values for discrete actions (not logits), so it could be more efficient with large spaces.
This is my attempt to get discrete only actions to work properly on onnx exported models with SB3 and also discrete actions > 2 to work with SB3. As I'm not very familiar with the code, libraries used and ML/RL, I will describe the approach and changes below, however in case of merging a review is requested. I'm marking this as draft as well, so any potential changes can be made before merging. I've only tested this with Stable Baselines and CleanRL example so far.
Update: After a quick test with rllib (using the yaml file from this repo and small file changes to make training and export work for me) with discrete actions only used in the env, the onnx model outputted floats, which wouldn't work with this update (to make it compatible with the update, the onnx model would need to output integers for discrete actions in discrete-only action environments). It's also not likely to work well with the current version as the Godot plugin does not handle logits for discrete-only action spaces (adding the conversion there instead of integrating it into the models may be a potential alternative approach).
This change is connected to the draft PR for plugin (https://github.com/edbeeching/godot_rl_agents_plugin/pull/16/files#) - changes to both repositories are necessary.
Changes to the main repository:
Main changes to
godot_rl/core/utils.py
: When training using SB3 on environments with only discrete actions, currently SB3 will output integer values for an action (e.g. 0,1,2,3,4), when using CleanRL example, action values will always be continuous/float values, and when using mixed spaces with SB3 the values will be continuous/float also.np.int64
, it will be added to actions without modification, otherwise the previous logic remains to convert the discrete action from the received float value to 1 or 0 based on value.Main changes to
godot_rl/wrappers/onnx/stable_baselines_export.py
Currently, when exporting models that have only discrete actions using the sb3 example, the .onnx model will produce logits for the discrete actions, while sb3 model produces integers for each discrete action. This produces an error on theverify_onnx_export
assertion.forward
method, and in case the action space isMultiDiscrete
, it should select the best action from the distribution (deterministic actions). Otherwise (for continuous or mixed actions), the old behavior is used. I've implemented this after checking the sb3 policies file to see how discrete actions are processed, however I cannot guarantee I've implemented it correctly. In a few small test cases I've tried, this makes theverify_onnx_export
tests pass and the model works in Godot.With that change, if the exported model uses discrete actions only (the example from the image has 2 discrete actions), the following operations are added which should select the action with the highest probability from the logits:
Changes to the plugin repository: ONNXInference.cs:
Previously, the exported onnx model would output float values for actions in case of either continuous or discrete actions used. After the changes above, onnx will output discrete actions as integers if only discrete actions are used in the environment. To handle that, I've made a few changes:
session
andresults
disposable values after use.sync.gd:
Simple test of output values when only discrete actions are used with the sb3 example for training after the changes (printing the received actions directly in Godot):