RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.35k stars 1.27k forks source link

binder: version of Drake binary is not guaranteed to be the latest #13209

Closed EricCousineau-TRI closed 4 years ago

EricCousineau-TRI commented 4 years ago

I posted in the issue linked below today, and opened up a new Binder session. I noticed that I got an old version of drake: https://github.com/RobotLocomotion/drake/issues/11457#issuecomment-622985019

Version info (from notebook):

In [3]: !cat /opt/drake/share/doc/drake/VERSION.TXT
Out [3]: 20200420074541 b02ff6e7b75d1a9a3ae7d4cf21fe76fa558cff74

FTR, my session URL was: (not that you'll be able to visit it?) https://notebooks.gesis.org/binder/jupyter/user/robotlocomotion-drake-2q29xfvv/tree/tutorials

When posting this issue, I wanted to make sure I could do this from scratch, so I opened another new Binder session:

However, when I opened a new session, I got a different version:

In [1]: !cat /opt/drake/share/doc/drake/VERSION.TXT
Out [1]: 20200430074549 26c38c92d871567f333eb8ece2677bcfc4ab165f

FTR, this new session URL was: https://hub.gke.mybinder.org/user/robotlocomotion-drake-zp1zbpov/tree/tutorials

I think you predicted something like this might happen due to our usage of a fixed branch name. My guess is it's a combination of optimization of server-side w/ JupyterHub's side, possibly coupled with user's local browser cache. Note that the URLs come from different hosts entirely, notebooks.gesis.org vs. hub.gke.mybinder.org.

Is there a possible (easy) mitigation?

Perhaps advice to users, like Ctrl+Shift+R (fully refresh cache) when visiting the Binder provisioning page? https://mybinder.org/v2/gh/RobotLocomotion/drake/nightly-release?filepath=tutorials

EDIT: Mentioned in Drake Slack, #python channel: https://drakedevelopers.slack.com/archives/C2CK4CWE7/p1588440496027800

EricCousineau-TRI commented 4 years ago

As of the time of writing (2020/05/02, 1p EST), these are the git logs of three different commits: first notebook session's SHA, second notebook session's SHA, and the current nightly-release:

$ cd drake
$ git log -n 1 b02ff6e7b75d1a9a3ae7d4cf21fe76fa558cff74
commit b02ff6e7b75d1a9a3ae7d4cf21fe76fa558cff74 (origin/tmp, tmp)
Author: Silvio Traversaro <silvio.traversaro@iit.it>
Date:   Sun Apr 19 17:55:42 2020 +0200

    parsing: Avoid to use implicit conversion from std::filesystem::path to std::string (#13089)

$ git log -n 1 26c38c92d871567f333eb8ece2677bcfc4ab165f
commit 26c38c92d871567f333eb8ece2677bcfc4ab165f
Author: Damrong Guoy <42557859+DamrongGuoy@users.noreply.github.com>
Date:   Wed Apr 29 17:14:18 2020 -0700

    [geometry] Example for contact-surface profiling: rigid bowl, soft ball. (#13138)

    * [geometry] Example for contact-surface profiling: rigid bowl, soft ball.

$ git fetch upstream
$ git log -n 1 upstream/nightly-release
commit 69c771fff58cc62a7dff3046cf7bb8112621b4e3 (upstream/nightly-release, upstream/master, master)
Author: Eric Cousineau <eric.cousineau@tri.global>
Date:   Sat May 2 00:31:22 2020 -0400

    drake_py_unittest: Force deprecation warnings to be errors (#13191)

    This change implies stricter error checking for future tests.
    Previously, Drake deprecations that occurred at import-time in a module (not when running a unittest) did not get translated into an error. This causes them to be an error.
jamiesnape commented 4 years ago

I think you predicted something like this might happen due to our usage of a fixed branch name.

A fixed branch name and/or using the latest tag in the FROM directive of the Dockerfile.

There is not a client-side workaround of which I am aware.

EricCousineau-TRI commented 4 years ago

Gotcha. As an interim, will PR (another) caveat for this to some docs, and lowering priority.

If it does hurt someone, then we can think of other workarounds - e.g. somehow using commits only, and forwarding to a link that we can then dispatch to Binder.

EricCousineau-TRI commented 4 years ago

Noticed that a 20200420 image is still in use, so that's about 2 weeks old: https://github.com/robotlocomotion/drake/pull/13211#pullrequestreview-406283017

jamiesnape commented 4 years ago

I figure they never do the rough equivalent of docker pull so it will probably be there whenever you get "lucky" enough to get that node.

If it were me, I think I would pin the tutorials on Binder to the latest monthly release. Logistically, I think that is the best we can do in repo without excessive churn and over-complicating things. All things considered, promising an API for a month is not an excessive burden.

EricCousineau-TRI commented 4 years ago

Alternative solution is (higher churn) to do daily or weekly update of the following file to point to Docker tags: https://github.com/RobotLocomotion/drake/blob/6761809c64237d54ac8a8bb37d4daccdbe21fb55/.binder/Dockerfile#L7 https://hub.docker.com/r/robotlocomotion/drake/tags

@jwnimmer-tri Can I ask if you have any strong feelings for or against this?

Ideally, it would happen on master, so that master and nightly-release do not diverge.

EricCousineau-TRI commented 4 years ago

Alt. alternative solution is don't use Binder, or do our own hosted JupyterHub instance, etc. Seems like high effort at this point.

jwnimmer-tri commented 4 years ago

I have a note on my TODO list to come back to this issue and better understand the release engineering challenges, and all of the moving pieces. I haven't had a chance to dive in yet.

My gut reaction to "use only Drake's numbered releases" was that it was a good solution. Can you speak to why using weekly or daily re-pinning of the tutorials is helpful? The idea that tutorials use only released APIs was appealing.

EricCousineau-TRI commented 4 years ago

Can you speak to why using weekly or daily re-pinning of the tutorials is helpful?

Yup! Mainly to be able incorporate bugfixes, like https://github.com/RobotLocomotion/drake/issues/11457, but also to ensure that tutorials can use any latest API changes.

The first part (bugfixes) is probably moot for Drake's Binder tutorial hosting if it doesn't affect current tutorials. However, if tutorials are affected by API changes, I believe we should be able to bump the tag (and not hold the tutorial back to the prior release).

I'd be fine if the policy is nominally release, but handles API breakage.

One example policy:

  1. Use FROM robotlocomotion/drake:{latest_release_tag} nominally.
  2. If there is a an API deprecation, the tutorials will be updated. They will show deprecation on Binder. There is no need to update the image.
  3. If there is a new API, or a breaking API change, used by the tutorials, then after merge, the image should be updated to FROM robotlocomotion/drake:{latest_nightly_date_tag} once the relevant commit is incorporated.
  4. Upon next release, reset it to Step 1.
jwnimmer-tri commented 4 years ago

I think I need to go back and understand the moving pieces before I try to dig into this. There's README for users, but the relationships that a developer needs to understand between moving pieces and release engineering seem to not be written down anywhere. I'll do that first before we start trying to figure out how to fix them.

EricCousineau-TRI commented 4 years ago

Jeremy, any chance you can look into this in the next month?

Jamie and I are talking about doing this in that timeline, and is it seems relatively simple, pending some agreement on procedure and docs.

jwnimmer-tri commented 4 years ago

The https://drake-jenkins.csail.mit.edu/view/Packaging/job/linux-bionic-unprovisioned-gcc-bazel-nightly-snopt-packaging build has a final step "PUSHING NIGHTLY RELEASE BRANCH" where it updates the refs for nightly-release. As I understand the binder integration, that refs determines what Binder is doing.

What if we added another step just after that one, call it "PUSHING NIGHTLY RELEASE BINDER BRANCH". It would first edit and commit .binder/Dockerfile to change the FROM robotlocomotion/drake:latest line in to say instead drake:bionic-20200618 (e.g.). Then it would push that to the refs for nightly-release-binder.

We would change our README links to point to the nightly-release-binder.

It seems to me like that would resolve this problem, with minimal effort and no recurring maintenance.

WDYT?

EricCousineau-TRI commented 4 years ago

SGTM!

Just thinking through, the only caveat is that we lose some semblance of Git commit parity, but really, the .binder/* files are all that matter; all the other Git info is in the existing built docker images, and are untouched.

@jamiesnape ?

jamiesnape commented 4 years ago

How are we going to handle the history of nightly-release-binder? Sounds like we are saying replace it with master each night?

jwnimmer-tri commented 4 years ago

Correct.

BetsyMcPhail commented 4 years ago

Drake-CI PR #115