edbeeching / godot_rl_agents

An Open Source package that allows video game creators, AI researchers and hobbyists the opportunity to learn complex behaviors for their Non Player Characters or agents
MIT License
942 stars 69 forks source link

Fix: Discrete actions with size > 2 showing only binary values in Godot #117

Closed Ivan-267 closed 1 year ago

Ivan-267 commented 1 year ago

Discrete actions with size > 2 would display only 0 or 1 even when using the multi discrete spaces.

This fix addresses that by removing the binary conversion (np.greater) in case of using multi discrete spaces.

A user-friendly error message is also added when trying to mix a discrete action larger than 2 with continuous actions.

I have quickly tested this on modified "RingPong" example code and it works in cases of:
1) 1 discrete action with size > 2. 2) 2 discrete actions with size > 2. 2) Discrete action with size up to 2 + continuous action.

image

However, as I'm not familiar with the code I don't know if this is the right solution, or if it might break something else.

Ivan-267 commented 1 year ago

This was deleted by accident, I will have to create a new fork later, I'm still new to Github collaboration. At least the changes are still visible so I will be able to re-implement them easily.

edbeeching commented 1 year ago

@Ivan-267 , I am adding you as a contributor to the lib so you can create branches etc directly here without having to fork.

Ivan-267 commented 1 year ago

As a note, there is more to look into when working with discrete actions than these changes alone. From sb3, exporting to onnx using only discrete actions (either size 2 or larger with changes to enable larger actions) also currently doesn't work (as the exported model produces logits, but Godot expects a discrete action as a number). In case of environments with only discrete actions, enabling discrete larger actions than 2 may affect the output from CleanRL example which outputs continuous actions (currently converted into binary actions in case of discrete actions). I haven't tried the other libraries yet so I'm not sure what the output for different action spaces is for those.

visuallization commented 1 year ago

@Ivan-267 Shouldn't we merge this to have at least discrete actions spaces which return more than just 0 and 1? I mean even if it is not working with onnx it seems to improve what we currently have.

Ivan-267 commented 1 year ago

Is this PR affected by / connected with rllib changes? I haven't tested it much with rllib, but I did more work in the later PR (https://github.com/edbeeching/godot_rl_agents/pull/121).

Specifically this part might be important:

https://github.com/edbeeching/godot_rl_agents/blob/1f718b31f5ac852510dbcba3edfea0ce019c2b0c/godot_rl/core/utils.py#L94-L120

On the commented line I noticed that just sending the direct action as is does not solve the issue in all cases.

In that PR I mentioned some things that still need to be considered (a plugin update is also pending before readding the changes), onnx inference side is affected by changes to this repository, although even with the current implementation there's an issue, onnx model will behave differently with discrete only actions (causing an action mismatch on export), as the forward method will output the logits during inference, while the sb3 model outputs selected discrete actions on training.

With the linked PR, the onnx model would output discrete actions (e.g. 0,1,2,3) rather than logits, but it would cause an error on the plugin without updating it, as it expects float data type (and discrete actions are added as integers, converting them to float is possible in e.g. the forward method of the onnx model, but would make it difficult to detect if the sent action is truly discrete or actually a continuous action that needs conversion on the godot side during inference without some kind of additional information exchange). Rllib would still output logits however with exported onnx.

With the changes to the exported onnx model from sb3, also "true" discrete actions and "mixed discrete actions" (which are continuous actions) need to be handled differently, so I added some changes to the plugin:

https://github.com/edbeeching/godot_rl_agents_plugin/blob/fc67722fc780f10f9ca1b3f259185f4e12b5b423/addons/godot_rl_agents/sync.gd#L130

There might be some other details I would have to recheck my implementation to see.

I'd like to mention an alternative solution as well:

If we look at the current implementation: https://github.com/edbeeching/godot_rl_agents_plugin/blob/main/addons/godot_rl_agents/sync.gd#L135C3

This might not be ideal for mixed actions (I think the action index should move by 1 for each discrete action), however, it's not a simple matter of just changing it to 1 because currently the onnx model will produce logits if discrete only actions are used. In that case, you do get as many values (logits) as the size of the action, but a single logit does not represent the value of the action (e.g. argmax from the logits could be used to select the most probable action for deterministic inference, or the probability distribution can be used after softmax to select actions using their respective probabilities).

The code could check whether or not all actions are discrete and handle the index differently as one possible approach.

For my PR I tried adding the action selection to be handled by the model in the forward method instead of post-processing the output from inference, as it may be more optimized, but also more in-line with what the actual sb3 model does when it outputs an action, however I'm not sure which is the optimal solution as that approach has complexities as well such as checking for type of action during inference (whether it's float or int), and e.g. for adding rllib's exported onnx support some modifications to the current scripts would be needed (it would need to output a selected discrete action rather than logits as well, if possible).