Provide DrakeGym API in pydrake to make the easy simulations more familiar

RussTedrake commented 3 years ago

From an office hours discussion.. many people have written small gym-like interfaces around Drake. We should provide one in drake master and a tutorial showing how to use it.

Here is the relevant documentation from gym. I'm not worried about the repo name / file structure advice, but simply aim to provide the familiar spellings for the interactions with the simulator. Specifically, I would provide

class DrakeGymEnv():
  metadata = {'render.modes': ['human']}

  def __init__(self):
    ...
  def step(self, action):
    ...
  def reset(self):
    ...
  def render(self, mode='human'):
    ...
  def close(self):
    ...

Note: We don't actually derive from gym.Env (to avoid the dependency).

I'd recommend we offer (at least) the following constructors:

DrakeGymEnv(urdf_or_sdf_filename) constructs a multibodyplant + scenegraph parsed from the file
DrakeGymEnv(system) so that someone can immediately use this interface on their own diagram / system.

Comments/feedback/recommendations welcome!

cc @EricCousineau-TRI @BenBurchfiel-TRI @ludwigschmidt-tri @calderpg-tri @kevinzakka-tri

RussTedrake commented 3 years ago

A few relevant projects:

abhishekunique commented 2 years ago

I think it would be quite helpful to have some of the following functionality (not all are crucial):

I think most important thing is to be able to quickly go from URDF/SDF/XML to some default environment where we can reset, step and collect random trajectories. So I think if we can abstract away explicit plant, scenegraph, controller creation and linking into a simple thing where we just specify URDF, and then use this to internally do all the above, with some default position + velocity state space (in a get obs function), provide torque/position actuators (step function) and have a reset function (that resets to some default pose), that'd already make the barrier to entry much lower. I think the typical RL user usually wants minimal control over stuff and wants to stick it into their RL algorithm as quickly as possible.
Be able to port over XML files from mujoco gym to use the infrastructure from there eg hopper, halfcheetah, etc etc. So a tool to convert XML files into drake readable format (SDF or URDF). Would be helpful to start off many of the people more familiar with gym. Basically barrier of entry is lower if we can take an XML and have an environment made. Reward functions could be ported for these particular environments, but seems like generally left up to user design. I think just having the familiar mujoco gym envs ported as a starting point would increase immediate adoption.
DrakeGymEnv(urdf_or_sdf_filename) seems pretty useful. Maybe also useful to allow some other simple functionality like having a mode for simple PD control on the joints instead of just torque control. MuJoCo would use the actuator here for the same, maybe useful to allow that functionality too.
since many people use images with RL, would be helpful to have both an onscreen renderer and on offscreen renderer that just returns an image array. Would eventually be useful with headless rendering and such.
would be useful to expose a number of different elements of a context like positions, velocities, different forms of rotations, etc for easy construction of a state space. Probably just positions and velocities directly accessible in a get_obs() function would be great, but having options of constructing a state from the other different elements of a context might be useful at some point in the future.
I think being able to set the position and velocity of the simulator (in mujoco would be setting qp, qv) to some user provided values would be helpful. Probably would be implemented in the reset function anyways but being able to specify a position and velocity and set the system to that would be helpful. Seems like SetPositions andd SetVelocities already does that in drake.
I think for much of the sim to real enthusiasts, programmatically changing positions of various bodies in the environment, positions of the robot, mass/color/friction/other physical properties of the environment would be a plus but definitely not a necessity in a first iteration.
I think many RL algorithms use multithreading of these gym envs and creation of multiple instances, so that'd be heflpul to allow too if possible (although it may be more involved).
I think doing the standard registration thing that gym envs do, where you can use gym.make to make the environments makes it a bit easier to just stick it in anywhere (but really not that important).

RussTedrake commented 2 years ago

As I've taken a few passes on this, I've realized that there are two separable ideas here: (1) Wrapping a system / simulator properly and easily in gym, and (2) Making some of the common diagram workflows very easy.

I think (1) is the first important step. I've put a draft for comments here: https://deepnote.com/project/DrakeGym-DfQOBtuAQeWJL3S3vMdlWg/%2Fnotebook.ipynb

I have some sugar methods that will build a gym straight from URDF/SDF and add cameras, etc. But I'd like to get this part so we're happy with it first.

RussTedrake commented 2 years ago

I've added documentation to the proposal above.

Another important question: Where should this functionality live? I assume we'll want RobotLocomotion/drake-gym with some github actions CI and a pip setup workflow?

EricCousineau-TRI commented 2 years ago

(1) Looks good, but may not span all simulators desired (e.g. retrieving depth images, label images, or having peeling textures as part of environment state and/or observation); it may be hard to do (1) in a super general way, but it's certainly good for most use cases. Perhaps it'd be good to state the subset of simulations the sugar could support, and suggest how to split off?

Is it possible to extend to something akin to RobotLocomotion/gym, where you can specify multiple cameras + images as part of direct observation space?

(2) For diagram utilities, I like what's in RobotLocomotion/gym; it's similar to what we have in Anzu. (We can publish it, but may not be ready soon.)

Re: gym.VectorEnv, and just parallelization in general, some gotchas (that maybe you've already solved):

Visualization is a bit of a stickler here. It's easy to put something like cv2.imshow() in render(mode='human') at call time, but it can be nuanced to not place things like Drake VIsualizer / Meshcat in a diagram if you're not visualizing (e.g. doing things in parallel) - in our case, we explicitly configure it to not publish -- which has it's own issue, where visualization can induce different timesteps and cause slightly different simulation results (that @siyuanfeng-tri ran into).
Serialization / multiprocessing. I'm unfortunately not directly familiar w/ how VectorEnv spawns, but I assume it may involve serialization / pickle. In Anzu, we have a Python-"sibling" of the schema functionality, and similarly it defines the distributions (e.g. initial configuration / poses, which manipulands to spawn, etc.). Parallelization is done by serializing the configuration, and then spawning the environment using that. I'm not sure how that parallels gym.VectorEnv, but I'm sure it could be made similar. (e.g. setting constructor to only accept serializable entities, ensuring it can be used in __getstate__ / __setstate__).

EricCousineau-TRI commented 2 years ago

Where should this functionality live? I assume we'll want RobotLocomotion/drake-gym with some github actions CI and a pip setup workflow?

Ah, forgot to answer this one - yup, something like drake-gym sounds excellent! (parallels drake-external-examples, what should soon be drake-ros, etc.). Would this be written as a pure Python package, possibly to be deployed on pip? (hopefully a simpler process than w/ Drake, given there are no direct C++ dependencies?)

RussTedrake commented 2 years ago

yes. that's my plan.

RussTedrake commented 2 years ago

My most up-to-date version is currently in https://github.com/RussTedrake/manipulation/blob/master/manipulation/gym.py , with the example I used in lecture in https://github.com/RussTedrake/manipulation/tree/master/manipulation/envs .

RussTedrake commented 2 years ago

If we want to enable c++ parallelism, this issue is relevant: https://github.com/RobotLocomotion/drake/issues/17363

RussTedrake commented 2 years ago

Update: @ggould-tri has been working towards upstreaming this to Drake (he has a newer version with some fixes and some unit tests in Anzu). It's a collaboration with @JoseBarreiros-TRI .

@ggould-tri -- would you like to take ownership of the issue?

I plan to use it again this November: https://manipulation.csail.mit.edu/Fall2022/schedule.html, and will likely help push it through before then if it hasn't landed sooner.

ggould-tri commented 2 years ago

Sorry for the delay -- I was on vacation.

Status: The code is currently stabilizing in our downstream repository. It is unit tested, but the tests rely on stable-baselines3 which is too heavy a dependency tree for a peripheral Drake feature.

(Drake Gym implements the OpenAI Gym API, and SB3 provides a test of sufficient API compliance)

Once it is stable downstream and has a good set of non-SB3 tests, I will PR it to Drake; at current pace that is likely to be in a couple of weeks. The non-SB3 tests will go into the Drake repository directly, the SB3-reliant ones will have to go into a notebook or somewhere similar.

Once this has happened, we will be attached to the versioning shear between Gym and SB3. In particular SB3 is lagging far behind Gym's oldest documented API and its API stability guarantee window -- this is the motivation for using the SB3 tests, since someone correctly implementing against the Gym docs will produce unusable code. (SB3 plans to resolve this after Gym ships 1.0).

Someone else will need to take ownership of that dependency management process.

FYI, notebooks working from drake dependencies + SB3 will likely also have to manage version shear around scipy/numpy.

EricCousineau-TRI commented 2 years ago

Nice!

Quick clarifying question - by "our upstream repository" do you mean "our downstream repository" (Anzu)?

ggould-tri commented 2 years ago

Oops, yes. "Downstream" in the sense that it depends on Drake; "upstream" in the sense that code flows thence to drake. So... cross-stream? :-)

jwnimmer-tri commented 2 years ago

Something about the plan as captured in the discussions above is not internally consistent.

Russ wrote (edited to collapse the ping-pong replies):

We'll want RobotLocomotion/drake-gym with some github actions CI and a pip setup workflow. This be would written as a pure Python package, to be deployed on pip.

Grant wrote:

Once it is stable downstream and has a good set of non-SB3 tests, I will PR it to Drake.

Perhaps Grant meant "Drake" to mean RobotLocomotion/drake-gym but given the surrounding text, I think he probably meant drake literally.

Is the plan to merge this feature into the drake repository directly, or to make a separate repository, or some combination of the two?

ggould-tri commented 2 years ago

In my more recent discussions with Russ, he'd been saying drake directly, e.g. "Update: @ggould-tri has been working towards upstreaming this to Drake" above.

I have no strong opinion about which approach is better; the dependency problem is both better (possible to grab sb3) and worse (have to manage all the other dependencies) in a separate repo.

jwnimmer-tri commented 2 years ago

Aha, I missed that subtlety. Sounds like drake directly is the plan, in that case.

RussTedrake commented 2 years ago

Confirmed. I think the new target is Drake with non-SB3 tests. Thanks!

ggould-tri commented 1 year ago

Update: Cutting drake-gym at the SB3-dependency seam was not practical, as (1) very little running code or testable code remains and (2) cutting sb3 itself at the acceptance test line is impractical because its internal dependencies change with each revision. As such I've started the cut at https://github.com/ggould-tri/drake-gym where it is so far working quite nicely (and teaching me a lot about bazel python rules, for better and worse). Still nailing down README/demo matter, then I'll try importing it from anzu and see what breaks.

jwnimmer-tri commented 1 year ago

Summary from a f2f chat:

A separate repository is fine for prototyping, but if we expect any users to actually it as a library it then we need to set up the build and CI and reviewable and documentation website hosting and release playbook and pypi shared accounts and etc. to make that happen. Since that is all a ton of effort, keeping the code within drake.git is likely to be the cheaper option.

To help evaluate the tradeoffs here, Grant will point me at a Drake branch with passing test cases for this code when run against a manually-created venv with SB3, and I'll see how difficult it is to add a relevant subset of SB3 to our source-build workspace. The goal would be to have enough to perform sufficient CI testing in Drake itself. The (possibly installed) example gym programs would only conditionally depend on SB3. Users who want to run them can pip install SB3 themselves; pydrake will not depend on it.

RussTedrake commented 1 year ago

A draft PR is available at https://github.com/RobotLocomotion/drake/pull/18178 . I've also merged updates from @ggould-tri back into my manipulation repo, and am working from there until it lands. Those links are still here: https://github.com/RobotLocomotion/drake/issues/15508#issuecomment-999221549

JoseBarreiros-TRI commented 1 year ago

In Anzu draft PR 10579, a prototype of DrakeGym using Gymnasium is available. Currently, it does not include any test coverage for training because the latest release (2.0.0a13) of SB3, which supports Gymnasium, is still a work in progress. From a conversation with @RussTedrake, we see three options moving forward with the Drake PR: 1) we wait until the SB3 current release is stable, 2) we PR using the OpenAI Gym API and upgrade it later to Gymnasium, or 3) start with a pegged sha from SB3 (the master branch is already using gymnasium) and then update to v2.0 when it's officially released. Any thoughts?

jwnimmer-tri commented 1 year ago

As I understand it, Drake's only use of SB3 would be for our bazel test cases of our drake_gym implementation, e.g., by calling check_env in a unit test to confirm that our code is satisfactory. Assuming that's true, from a Drake build & release perspective there is no preference one way another for which version of SB3 we pin. Since it's only used for internal tests, users are not directly exposed to our choice of SB3, because pydrake will never do from stable_baselines3 import .... The only metric is to keep our developers happy; our users are unaffected.

What probably matters more to users is which version(s) of Gym and/or Gymnasium we support. I suggest starting from that question and then working backwards to find which SB3(s) that implies. (Maybe that's what you were asking already, but it didn't quite read that way to me.)

In short: what version(s) of Gym and/or Gymnasium do users want to run alongside Drake? I assume if we can start with supporting just one newest plausible version, that would be the least maintenance burden for us.

JoseBarreiros-TRI commented 1 year ago

That is correct, but, additionally, we need SB3 for testing of training, something similar to this, and provide examples for training and rollout of the policy. To answer your question about the user preference for Gym vs Gymnasium, I think Gymnasium would be preferred since Gym is not maintained anymore and SB3 and RLlib already support Gymnasium.

jwnimmer-tri commented 1 year ago

... we need SB3 for testing of training ...

Right. At https://github.com/RobotLocomotion/drake/tree/master/tools/workspace/stable_baselines3_internal we've adjusted SB3 with a pytorch-ectomy. So long as we maintain that invariant for whatever training tests we need, everything is still fine. I assume that PPO training can still run without torch. The goal is that we do not depend on pytorch in our automated tests. Users could themselves still use pytorch if they wish.

RussTedrake commented 1 year ago

The initial version has landed in #19831 . 🎉

RobotLocomotion / drake

Provide DrakeGym API in pydrake to make the easy simulations more familiar #15508