facebookresearch / mbrl-lib

Library for Model Based RL
MIT License
952 stars 154 forks source link

Gymnasium migration #177

Closed Rohan138 closed 1 year ago

Rohan138 commented 1 year ago

Types of changes

Motivation and Context / Related issue

Migrating mbrl-lib from gym=0.17 all the way to a combination of gym=0.26.2 and gymnasium=0.26.3.

How Has This Been Tested (if it applies)

Will need thorough testing to ensure I didn't break anything, and also to make sure that performance is retained after refactoring the step API to use terminated and truncated.

Checklist

Rohan138 commented 1 year ago

Update: Most files have been changed and fixed, tests with tests/core, tests/pybullet, tests/mujoco, tests/dmcontrol. Still fixing tests/algorithms.

Rohan138 commented 1 year ago

@luisenp @natolambert this should be ready for review.

raghavauppuluri13 commented 1 year ago

Benchmarking the PR against main. Here are the reward curves for some select envs. The planet algorithms are still training, but you can see they are generally similar. - EDIT: All are finished training image

natolambert commented 1 year ago

@raghavauppuluri13 lmk if you're compute limited, shouldn't be too hard for me to run some too.

raghavauppuluri13 commented 1 year ago

@raghavauppuluri13 lmk if you're compute limited, shouldn't be too hard for me to run some too.

Do we want to benchmark all the env/algo combos in the examples? Or is this good enough? If so, that would be great. I've got a helper script.

#!/bin/bash

main_envs=("pets_reacher" "pets_pusher" "planet_cartpole_swingup" "planet_finger_spin" "mbpo_halfcheetah" "mbpo_inv_pendulum")
diff_envs=("pets_reacher" "pets_pusher" "planet_cartpole_swingup" "planet_finger_spin" "mbpo_half_cheetah_v4" "mbpo_inv_pendulum_v4")
algos=("pets" "pets" "planet" "planet" "mbpo" "mbpo")
device=("cuda:0" "cuda:0" "cuda:0" "cuda:0" "cuda:0" "cuda:0")
envs=$main_envs
tsp -S 6

for i in ${!envs[@]}; do
    if [[ "${algos[$i]}" == "planet" ]]; then
        tsp python -m mbrl.examples.main algorithm=${algos[$i]} dynamics_model=${algos[$i]} overrides=${envs[$i]} device=${device[$i]}
    else
        tsp python -m mbrl.examples.main algorithm=${algos[$i]} overrides=${envs[$i]} device=${device[$i]}
    fi
done
raghavauppuluri13 commented 1 year ago

@natolambert @luisenp Any other validation steps before the pr is ready to merge? (I updated the benchmarks with the finished training curves)

natolambert commented 1 year ago

Awesome work all, have been following along happily.

Rohan138 commented 1 year ago

@luisenp Could I get another review? Should be good to merge now, let me know if there's anything else I can change