Closed RussTedrake closed 1 year ago
A few relevant projects:
I think it would be quite helpful to have some of the following functionality (not all are crucial):
As I've taken a few passes on this, I've realized that there are two separable ideas here: (1) Wrapping a system / simulator properly and easily in gym, and (2) Making some of the common diagram workflows very easy.
I think (1) is the first important step. I've put a draft for comments here: https://deepnote.com/project/DrakeGym-DfQOBtuAQeWJL3S3vMdlWg/%2Fnotebook.ipynb
I have some sugar methods that will build a gym straight from URDF/SDF and add cameras, etc. But I'd like to get this part so we're happy with it first.
I've added documentation to the proposal above.
Another important question: Where should this functionality live? I assume we'll want RobotLocomotion/drake-gym
with some github actions CI and a pip setup workflow?
(1) Looks good, but may not span all simulators desired (e.g. retrieving depth images, label images, or having peeling textures as part of environment state and/or observation); it may be hard to do (1) in a super general way, but it's certainly good for most use cases. Perhaps it'd be good to state the subset of simulations the sugar could support, and suggest how to split off?
Is it possible to extend to something akin to RobotLocomotion/gym
, where you can specify multiple cameras + images as part of direct observation space?
(2) For diagram utilities, I like what's in RobotLocomotion/gym
; it's similar to what we have in Anzu. (We can publish it, but may not be ready soon.)
Re: gym.VectorEnv
, and just parallelization in general, some gotchas (that maybe you've already solved):
Visualization is a bit of a stickler here. It's easy to put something like cv2.imshow()
in render(mode='human')
at call time, but it can be nuanced to not place things like Drake VIsualizer / Meshcat in a diagram if you're not visualizing (e.g. doing things in parallel) - in our case, we explicitly configure it to not publish -- which has it's own issue, where visualization can induce different timesteps and cause slightly different simulation results (that @siyuanfeng-tri ran into).
Serialization / multiprocessing. I'm unfortunately not directly familiar w/ how VectorEnv spawns, but I assume it may involve serialization / pickle. In Anzu, we have a Python-"sibling" of the schema functionality, and similarly it defines the distributions (e.g. initial configuration / poses, which manipulands to spawn, etc.). Parallelization is done by serializing the configuration, and then spawning the environment using that. I'm not sure how that parallels gym.VectorEnv
, but I'm sure it could be made similar. (e.g. setting constructor to only accept serializable entities, ensuring it can be used in __getstate__ / __setstate__
).
Where should this functionality live? I assume we'll want RobotLocomotion/drake-gym with some github actions CI and a pip setup workflow?
Ah, forgot to answer this one - yup, something like drake-gym
sounds excellent! (parallels drake-external-examples
, what should soon be drake-ros
, etc.).
Would this be written as a pure Python package, possibly to be deployed on pip
? (hopefully a simpler process than w/ Drake, given there are no direct C++ dependencies?)
yes. that's my plan.
My most up-to-date version is currently in https://github.com/RussTedrake/manipulation/blob/master/manipulation/gym.py , with the example I used in lecture in https://github.com/RussTedrake/manipulation/tree/master/manipulation/envs .
If we want to enable c++ parallelism, this issue is relevant: https://github.com/RobotLocomotion/drake/issues/17363
Update: @ggould-tri has been working towards upstreaming this to Drake (he has a newer version with some fixes and some unit tests in Anzu). It's a collaboration with @JoseBarreiros-TRI .
@ggould-tri -- would you like to take ownership of the issue?
I plan to use it again this November: https://manipulation.csail.mit.edu/Fall2022/schedule.html, and will likely help push it through before then if it hasn't landed sooner.
Sorry for the delay -- I was on vacation.
Status: The code is currently stabilizing in our downstream repository. It is unit tested, but the tests rely on stable-baselines3 which is too heavy a dependency tree for a peripheral Drake feature.
(Drake Gym implements the OpenAI Gym API, and SB3 provides a test of sufficient API compliance)
Once it is stable downstream and has a good set of non-SB3 tests, I will PR it to Drake; at current pace that is likely to be in a couple of weeks. The non-SB3 tests will go into the Drake repository directly, the SB3-reliant ones will have to go into a notebook or somewhere similar.
Once this has happened, we will be attached to the versioning shear between Gym and SB3. In particular SB3 is lagging far behind Gym's oldest documented API and its API stability guarantee window -- this is the motivation for using the SB3 tests, since someone correctly implementing against the Gym docs will produce unusable code. (SB3 plans to resolve this after Gym ships 1.0).
Someone else will need to take ownership of that dependency management process.
FYI, notebooks working from drake dependencies + SB3 will likely also have to manage version shear around scipy/numpy.
Nice!
Quick clarifying question - by "our upstream repository" do you mean "our downstream repository" (Anzu)?
Oops, yes. "Downstream" in the sense that it depends on Drake; "upstream" in the sense that code flows thence to drake. So... cross-stream? :-)
Something about the plan as captured in the discussions above is not internally consistent.
Russ wrote (edited to collapse the ping-pong replies):
We'll want
RobotLocomotion/drake-gym
with some github actions CI and a pip setup workflow. This be would written as a pure Python package, to be deployed on pip.
Grant wrote:
Once it is stable downstream and has a good set of non-SB3 tests, I will PR it to Drake.
Perhaps Grant meant "Drake" to mean RobotLocomotion/drake-gym
but given the surrounding text, I think he probably meant drake
literally.
Is the plan to merge this feature into the drake
repository directly, or to make a separate repository, or some combination of the two?
In my more recent discussions with Russ, he'd been saying drake
directly, e.g. "Update: @ggould-tri has been working towards upstreaming this to Drake" above.
I have no strong opinion about which approach is better; the dependency problem is both better (possible to grab sb3) and worse (have to manage all the other dependencies) in a separate repo.
Aha, I missed that subtlety. Sounds like drake
directly is the plan, in that case.
Confirmed. I think the new target is Drake with non-SB3 tests. Thanks!
Update: Cutting drake-gym at the SB3-dependency seam was not practical, as (1) very little running code or testable code remains and (2) cutting sb3 itself at the acceptance test line is impractical because its internal dependencies change with each revision. As such I've started the cut at https://github.com/ggould-tri/drake-gym where it is so far working quite nicely (and teaching me a lot about bazel python rules, for better and worse). Still nailing down README/demo matter, then I'll try importing it from anzu and see what breaks.
Summary from a f2f chat:
A separate repository is fine for prototyping, but if we expect any users to actually it as a library it then we need to set up the build and CI and reviewable and documentation website hosting and release playbook and pypi shared accounts and etc. to make that happen. Since that is all a ton of effort, keeping the code within drake.git
is likely to be the cheaper option.
To help evaluate the tradeoffs here, Grant will point me at a Drake branch with passing test cases for this code when run against a manually-created venv with SB3, and I'll see how difficult it is to add a relevant subset of SB3 to our source-build workspace. The goal would be to have enough to perform sufficient CI testing in Drake itself. The (possibly installed) example gym programs would only conditionally depend on SB3. Users who want to run them can pip install
SB3 themselves; pydrake will not depend on it.
A draft PR is available at https://github.com/RobotLocomotion/drake/pull/18178 . I've also merged updates from @ggould-tri back into my manipulation repo, and am working from there until it lands. Those links are still here: https://github.com/RobotLocomotion/drake/issues/15508#issuecomment-999221549
In Anzu draft PR 10579, a prototype of DrakeGym using Gymnasium is available. Currently, it does not include any test coverage for training because the latest release (2.0.0a13) of SB3, which supports Gymnasium, is still a work in progress. From a conversation with @RussTedrake, we see three options moving forward with the Drake PR: 1) we wait until the SB3 current release is stable, 2) we PR using the OpenAI Gym API and upgrade it later to Gymnasium, or 3) start with a pegged sha from SB3 (the master branch is already using gymnasium) and then update to v2.0 when it's officially released. Any thoughts?
As I understand it, Drake's only use of SB3 would be for our bazel test cases of our drake_gym
implementation, e.g., by calling check_env
in a unit test to confirm that our code is satisfactory. Assuming that's true, from a Drake build & release perspective there is no preference one way another for which version of SB3 we pin. Since it's only used for internal tests, users are not directly exposed to our choice of SB3, because pydrake
will never do from stable_baselines3 import ...
. The only metric is to keep our developers happy; our users are unaffected.
What probably matters more to users is which version(s) of Gym and/or Gymnasium we support. I suggest starting from that question and then working backwards to find which SB3(s) that implies. (Maybe that's what you were asking already, but it didn't quite read that way to me.)
In short: what version(s) of Gym and/or Gymnasium do users want to run alongside Drake? I assume if we can start with supporting just one newest plausible version, that would be the least maintenance burden for us.
That is correct, but, additionally, we need SB3 for testing of training, something similar to this, and provide examples for training and rollout of the policy. To answer your question about the user preference for Gym vs Gymnasium, I think Gymnasium would be preferred since Gym is not maintained anymore and SB3 and RLlib already support Gymnasium.
... we need SB3 for testing of training ...
Right. At https://github.com/RobotLocomotion/drake/tree/master/tools/workspace/stable_baselines3_internal we've adjusted SB3 with a pytorch
-ectomy. So long as we maintain that invariant for whatever training tests we need, everything is still fine. I assume that PPO training can still run without torch. The goal is that we do not depend on pytorch
in our automated tests. Users could themselves still use pytorch
if they wish.
The initial version has landed in #19831 . 🎉
From an office hours discussion.. many people have written small gym-like interfaces around Drake. We should provide one in drake master and a tutorial showing how to use it.
Here is the relevant documentation from gym. I'm not worried about the repo name / file structure advice, but simply aim to provide the familiar spellings for the interactions with the simulator. Specifically, I would provide
Note: We don't actually derive from
gym.Env
(to avoid the dependency).I'd recommend we offer (at least) the following constructors:
DrakeGymEnv(urdf_or_sdf_filename)
constructs a multibodyplant + scenegraph parsed from the fileDrakeGymEnv(system)
so that someone can immediately use this interface on their own diagram / system.Comments/feedback/recommendations welcome!
cc @EricCousineau-TRI @BenBurchfiel-TRI @ludwigschmidt-tri @calderpg-tri @kevinzakka-tri