Imitation learning support

Ivan-267 commented 8 months ago

Motivation:

Supporting pre-training with GAIL could be helpful in some environments where it might be difficult to define a good dense reward function that results in the desired behavior. One use case could be to do some pre-training with GAIL followed by standard PPO RL training with a sparse reward function. Of course, making quality recorded demos and setting more hyperparameters are added challenges. The intention of this PR is only to add basic support.

TODO:

[x] Onnx export (as the single obs space wrapper is used, it needs a slightly different onnx export script, probably similar to the cleanrl export script)
[x] Testing (basic)
[x] Basic guide on how to use

Description:

Together with demo recorder (https://github.com/edbeeching/godot_rl_agents_plugin/pull/35), adds basic support and example script for training or pre-training using imitation learning.

Uses the imitation library, which is compatible with SB3: https://imitation.readthedocs.io/en/latest/

Imitation provides clean implementations of imitation and reward learning algorithms, under a unified and user-friendly API. Currently, we have implementations of Behavioral Cloning, DAgger (with synthetic examples), density-based reward modeling, Maximum Causal Entropy Inverse Reinforcement Learning, Adversarial Inverse Reinforcement Learning, Generative Adversarial Imitation Learning, and Deep RL from Human Preferences.

For now, the script uses GAIL, although BC and AIRL will also work with small changes to the script. Some of the other algorithms may need changes to the env code or different data formats than the current demo recorder makes.

Ivan-267 commented 8 months ago

I have added a basic tutorial here: https://github.com/edbeeching/godot_rl_agents/blob/imitation_learning_experimental/docs/IMITATION_LEARNING.md

As there are no example envs built specifically for imitation learning yet, it shows how to modify an existing env to record demos and train an agent from the demos using GAIL. Even though it doesn't improve performance in that specific case, it explains how to setup an env for demo recording and the rest of the process which can be applied to any existing or new / custom env.

edbeeching commented 8 months ago

hey @Ivan-267 thanks for adding this! Is this good to review now?

Ivan-267 commented 8 months ago

hey @Ivan-267 thanks for adding this! Is this good to review now?

Yes, it should be.

The tutorial may receive another small update or two before merge, but the code should be functional.

Ivan-267 commented 8 months ago

Thank you for the review.

For reference and easier testing, I have pushed to a temporary branch the RL env with modifications I used in the tutorial to my fork of the examples repo: https://github.com/Ivan-267/godot_rl_agents_examples/tree/il_test_env/il_test_env (as a note switching from SyncOverride.gd back to Sync.gd on the sync nodes in testing/training scenes would make the recording process noticeably smoother with action repeat 10).

The modified env should be ready to record in the "testing scene".

edbeeching commented 8 months ago

Hey @Ivan-267 , I just tried it with JumperHard. The recording of trajectories works fine with the modification from the guide, but the trained model just runs backwards and forwards, I am wondering if this is because of the action space in JumperHard is a dict with three keys rather than a single vector? Do you have any ideas?

Ivan-267 commented 8 months ago

Hey @Ivan-267 , I just tried it with JumperHard. The recording of trajectories works fine with the modification from the guide, but the trained model just runs backwards and forwards, I am wondering if this is because of the action space in JumperHard is a dict with three keys rather than a single vector? Do you have any ideas?

Unless it's not learning well, it could have something to do with the action space. I will have to try the same env and get back to you.

It did previously seem to work with CarParkingEnv as well, where the action space has two keys:

func get_action_space() -> Dictionary:
    return {
        "acceleration" : {
            "size": 1,
            "action_type": "continuous"
        },
        "steering" : {
            "size": 1,
            "action_type": "continuous"
        },
        }

I don't have the modified version open right now, but I think I treated this as just two values in the array when getting the action values.

Ivan-267 commented 8 months ago

@edbeeching Thanks for spotting this, it seems to be related to my plugin modifications not resetting done on inference and human modes (it seems I need to bring that back).

Edit: I've pushed the update to the plugin, I have yet to test the changes properly, but hopefully it should work now: https://github.com/edbeeching/godot_rl_agents_plugin/pull/35

In this example done affects the selected action, so it didn't allow the agent to e.g. jump. For training, done is reset after being sent, and for recording obs, done is also reset after the trajectory of an episode is recorded, so in those two modes it worked OK.

As for some other modifications, as recording is based on episodes, I changed the AIController reset to 1000 steps for easier recording. Here are some results of onnx inference after 3 million steps of IL training with action repeat 4 for recording / training / inference.

https://github.com/edbeeching/godot_rl_agents/assets/61947090/33367bdf-3386-4b10-bca5-256683ef5d4d

Not great yet, but working. As I finish the modifications to the plugin I'll push the update so you can try it too.

Some AIController code changes:

func get_action():
    return [float(_player.jump_action), _player.move_action, _player.turn_action]

func set_action(action = null):
    if action:
        _player.move_action = action["move"][0]
        _player.turn_action = action["turn"][0]
        _player.jump_action = action["jump"][0] > 0
    else:
        _player.move_action = clamp(
            (
                Input.get_action_strength("move_backwards")
                - Input.get_action_strength("move_forwards")
            ),
            -1.0,
            0.5
        )
        _player.turn_action = (
            Input.get_action_strength("turn_left") - Input.get_action_strength("turn_right")
        )
        _player.jump_action = Input.is_action_pressed("jump")

func get_action_space():
    return {
        "jump": {"size": 1, "action_type": "continuous"},
        "move": {"size": 1, "action_type": "continuous"},
        "turn": {"size": 1, "action_type": "continuous"}
    }

Note: I changed the jump key input to is_action_pressed instead of just_pressed since if we have e.g. action repeat 8 on AIController set to the demo recording mode, key press often won't get detected due to lag as it sets the action once every 8 steps (we would have to press the key at the exact step when it's setting the action).

Some Player.gd changes:

func _physics_process(_delta):
    if ai_controller.heuristic == "human":
        ai_controller.set_action()

func get_move_vec() -> Vector3:
    if ai_controller.done:
        return Vector3.ZERO

    return Vector3(0, 0, clamp(move_action, -1.0, 0.5))

func get_turn_vec() -> float:
    return turn_action

func get_jump_action() -> bool:
    if ai_controller.done:
        jump_action = false

    return jump_action

Ivan-267 commented 8 months ago

With the updated plugin as above, 33 recorded successful episodes, action repeat 8, n_steps=128 in PPO settings and 1 million steps of IL training:

https://github.com/edbeeching/godot_rl_agents/assets/61947090/b37bd59f-ad04-4334-8a47-5ea860142adb

(The video shows onnx inference of the exported model).

edbeeching commented 8 months ago

Hey @Ivan-267 thanks for those comments. I managed to get it working, I think it was a combination of the issue with the done flag, not enough recorded trajectories and not enough IL training steps.

I will review https://github.com/edbeeching/godot_rl_agents_plugin/pull/35 (tomorrow) and then we can get this merged.

edbeeching / godot_rl_agents