edbeeching / godot_rl_agents_examples

Example Environments for the Godot RL Agents library
MIT License
37 stars 13 forks source link

Adds multilevel robot env #19

Closed Ivan-267 closed 7 months ago

Ivan-267 commented 7 months ago

A minigame environment with multiple mini-levels for the robot to pass. The robot needs to avoid falling down, pick up all coins on levels that have coins, and avoid enemy robots in the final level.

https://github.com/edbeeching/godot_rl_agents_examples/assets/61947090/ff73533e-97c4-4426-8a7c-d3c2936d90c8

edbeeching commented 7 months ago

Hey @Ivan-267, cool env! Is the agent in the video being controlled by a RL agent or a human?

Ivan-267 commented 7 months ago

Thank you, the video is from onnx inference of the trained agent.

edbeeching commented 7 months ago

It performs really well, great job.

Ivan-267 commented 7 months ago

Thanks for the review and comments. I made some adjustments based on your feedback.

Did you have to tune the hyper-parameters much for this env?

I think these were the settings used for the onnx (some are the defaults, but kept there for easier tweaking):

    policy_kwargs = dict(log_std_init=log(1.0))
    model: PPO = PPO("MultiInputPolicy", env, verbose=1, n_epochs=10, learning_rate=0.0003, clip_range=0.2, ent_coef=0.00695, n_steps=480, batch_size=450, policy_kwargs=policy_kwargs, tensorboard_log=args.experiment_dir)

with: --timesteps=8_000_000 --n_parallel=6 --speedup=15

The training only took a couple of hours on my PC on an older CPU. Those aren't necessarily the best set of parameters, just what I had set.

training

I notice there is a seemingly random StaticBody3D in the training_scene Scene. I am not sure if you intended to remove it.

Renamed to clarify the purpose. It stores the WorldBoundaryShape3D used to detect when the Robot falls down. Only one such shape is needed in training and testing scenes regardless of how many GameScene instances there are (so the shape is not instanced with GameScene), with the limit that they must be at the same height and on the same plane.

The training scene is missing a WorldEnvironment node

Added a basic WorldEnv node, it was previously missing due to trying to simplify the training scene rendering as much as possible (no directional light shadows, etc.), but a basic one without the effects will probably not have much of an impact.

I see you are importing a lot from blender, are you manually adding the Static Bodies and collision shapes after import? Incase you do not know you can append -col to an object name in Blender and Godot will automatically make the StaticBody and collision shape for it. More details here

I used a semi-manual approach here, it remained after experimenting with different approaches. I'm aware of the feature and I agree it is simpler to use it in this case. I just modified the levels to use that where possible (kept a few shapes in Godot so that the geometry doesn't change from what the model was trained with), making it simpler to add new levels.

I noticed in the obs you have nearest coin, enemy etc. Did you consider using a grid sensor instead?

More complex levels might require more info (e.g. multiple coins), but for this layout having just one Vector3 per category added to obs keeps the env calculations fewer/training faster. I started with this simplest case and would try to add more data if needed to solve the levels. The vectors to n closest objects approach also gives more precise data about the location of the objects, although I can't say how much precision is necessary for this specific env. The GridSensor might be slightly simpler to use on the Godot side, but it would have likely decreased the training speed.

edbeeching commented 7 months ago

Thanks for all the details. Perhaps we can think of a better way to store the training parameters so that new users can have a rough idea of where to start.

Ivan-267 commented 7 months ago

Thanks for all the details. Perhaps we can think of a better way to store the training parameters so that new users can have a rough idea of where to start.

That's a good idea, I don't have the hyperparams saved for all envs, but all were trained on standard SB3's PPO with mostly modifying the n_steps (if changed, usually increased from 32 depending on the env), num_epoch (if changed, usually decreased from 10), learning_rate, and such. For some I may have tweaked other things, but I most often changed those few that I mentioned.

I added an issue so we can track this independently of this env: https://github.com/edbeeching/godot_rl_agents_examples/issues/20

Should I merge this env now as is (we can future-reference the parameters for this env from this PR), or should we wait until we have a system for storing the hyperparameters?

edbeeching commented 7 months ago

@Ivan-267 , go ahead and merge and we can think about hyperparam saving later