RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.36k stars 1.27k forks source link

Add support for Bazel build system #3129

Closed jwnimmer-tri closed 6 years ago

jwnimmer-tri commented 8 years ago

The Bazel build system is a relatively new tool that provides correct, reproducible, fast builds. It is the open-source version of google's own build tool.

I believe that Bazel would solve many of TRI's challenges in using and supporting Drake, and thus we should experiment with adding Bazel support into Drake. For starters, it should be enough to merely add Bazel support for the newer C++ core of Drake (common, math, systems_framework).

This would start as workstation-only test, then graduate to a post-merge build only, and not in the officially supported set. If and when we demonstrate that it is reliable and useful for developers, we can consider making Bazel support official.

It would be nice to start on this soon, in order to help prepare and test the upcoming Bazel 0.5 official support for Windows in a few months, so that we can help drive it to meet our needs.

david-german-tri commented 8 years ago

I'm in.

jamiesnape commented 8 years ago
  1. CI system would need to be completely refactored.
  2. Multiple build systems are a bad idea.
  3. Adding features to CMake that you may need in future is easy, for Bazel?

Most of your problems are due to legacy code and the unusual PODs conventions therein. If you rewrote the build system from scratch in any language it would be an improvement.

mwoehlke-kitware commented 8 years ago

Some critical features that I'm not seeing offhand in Bazel:

To fully take advantage of it, we'd probably also have to port all of our dependencies.

jwnimmer-tri commented 8 years ago

Multiple build systems are a bad idea.

This is a compelling argument against attempting a port. Supporting two systems in parallel during the transition will be somewhat painful. We can perhaps localize the pain to only the bazel-porters (i.e., have a separate CI that is non-authoritative, and nobody but the porters care about).

Having a good plan of action here would be an important part of this pursuit (and I don't have one yet).

CI system would need to be completely refactored.

I'm not too worried about this. It will be effort and cost, but not risk.

Adding features to CMake that you may need in future is easy, for Bazel?

As with SCons, Bazel lets you escape into (limited) Python if you really want to, and is open-source should you need to modify the actual core.

Most of your problems are due to legacy code and the unusual PODs conventions therein. If you rewrote the build system from scratch in any language it would be an improvement.

I agree that's a big part of the problem, and that nuking the current build system code with a rewrite is a big win. Still, if we stick with CMake, the lack of reproducibility and caching for builds and tests is a big hole CMake, that Bazel and SCons both fill.

[Lacks the] Ability to locate and use external libraries. ("Use" seems to be there, at least partly, but "locate" is missing.)

Bazel often tends towards "build your libraries from source" (as we do with drake externals), which actually helps reproducibility, but you can also get required libraries from the system without much trouble.

And for "locate" (searching), I actually think that's a misfeature. There should be one way to build Drake-the-entireity on a given platform, which means that the dependency will be in one well-known, hard-coded place. Rooting around in system paths, the user's homedir, and finding something that "smells like" the right version of a library or tool only leads to confusing bug reports and wasted time. Or if the developer really wants a different path to some library, they can update their Bazel file to explicitly reference it.

[Lacks the] Ability to install things.

Hmm? Are you saying we'd have to roll your own "zip up these files into a release" logic, but in the CMake case that's already built-in? That's fair, but doesn't seem like a huge difference, compared to the effort of ongoing build system upkeep and developer downtime.

[Lacks] Any Windows support at all. (It looks like Bazel may assume a GCC-like compiler.)

This is in progress (http://bazel.io/roadmap.html), scheduled for a couple months out. Probably slips a bit more, but still relatively near-term compared to the scope of this ticket.

To fully take advantage of it, we'd probably also have to port all of our dependencies.

Yes, this is worth a bit of effort-assessment. Dependencies that are just a pile of C++ code are easy to port (I've done it). Dependencies that have bespoke -DTHIS_AND_THAT autoconf-like logic are harder, but there are ways to cope.

jwnimmer-tri commented 8 years ago

FYI, active development prototype is ongoing here https://github.com/jwnimmer-tri/drake/commits/bazelspike. It is able to build the automotive slice of the code.

The current "off the top of my head" plan is something like "allow BUILD files to be PR'd in-tree without CI yet" as we bring this up. This lets other bazel workspaces use Drake-as-a-library, without Drake-as-a-project needing to adopt bazel for its demos, tests, etc. Probably the next step after that is to get some kind of CI for Drake-as-a-library using bazel, but still rely on CMake for tests, demos, matlab, etc. I plan to iterate with David's review on a real plan offline, and then post here for comments.

jwnimmer-tri commented 8 years ago

Next-up proposal... for C++ libraries that bazel knows about, teach CMake to obtain the list of sources from the BUILD file, instead of repeating the list in two places.

WIP at https://github.com/jwnimmer-tri/drake/tree/bazel-reuse2

jamiesnape commented 8 years ago

Not sure that is going to work. I am trying to work out if you would get extra cmake re-configures when file lists have not changed or not enough re-configures when they do change.

This is all kind of reversed when the purpose of CMake is to generate the build files for other systems. There is no bazel support now, but there could be.

jwnimmer-tri commented 8 years ago

When we discussed Bazel vs CMake a few months ago during the visit, the Kitware consensus in the room at that time was that Bazel is higher-level than CMake, and the one-feeds-the-other direction should go as I've done here.

In any case, what is the best solution in CMake for a list of library sources to be dynamically computed? I had done https://github.com/RobotLocomotion/drake/commit/6a1a04211cdb5b5c95dc011683efacacf079a43c originally -- which asks a python program to emit a list of sources -- but the approach in the reuse2 branch seemed like it would scale to generating even more of the listfile content directly from the BUILD.

jwnimmer-tri commented 8 years ago

The other option I've considered is to fully omit the listfiles for core Drake (common, math, systems, solvers, etc.) and just have CMake delegate to bazel to build the core. That is probably better long term, but I was hoping to get there incrementally.

jamiesnape commented 8 years ago

That would certainly be cleanest.

jamiesnape commented 8 years ago

Then CMake would provide an external interface for CMake consumers, build CMake externals, and other legacy code. We have integrated with external tools like Ant and Gradle for Java before, so this may not be that different.

jamiesnape commented 8 years ago

(Might be an idea to create a ticket for the CMake side and assign it to Kitware since we have done similar many times before.)

jwnimmer-tri commented 8 years ago

Yeah. I have to give more thought and discussion to the optimal path forward. For now perhaps I'll just keep the two spellings of the list-of-sources duplicated.

jamiesnape commented 8 years ago

Sure. My general opinion is that whatever the merits of Bazel or CMake, having two build systems duplicating each other or working in unusual ways to accommodate each other is going to cause a lot of extra maintenance at best, confusion or bugs at the worst. I also think there is some scope for upstreaming some useful Bazel support to CMake which might make things easier.

jwnimmer-tri commented 7 years ago

Starting from the wonderful list in #4045, here's a checklist of our C++ externals, to be ticked off once Bazel support for them is complete for Drake's use. I will keep this updated as we make progress.

jwnimmer-tri commented 7 years ago

Relatedly, here's a checklist of tooling needs, to be ticked off for Bazel support. I will keep this updated as we make progress:

The goal is only that they can be used without having to run CMake first. Things like, e.g. doxygen don't necessarily have to be a Bazel target -- a shell script called by CMake than can also be called manually would be fine.

david-german-tri commented 7 years ago

Here's a checklist of things that the Bazel build accesses non-hermetically, which should be made hermetic.

Libraries

Binaries

david-german-tri commented 7 years ago

Here is a migration plan from CMake to Bazel, extracted from f2f conversation with myself, @jwnimmer-tri, and @jamiesnape. It preserves the current CMake interface downstream.

  1. 100% of C++ Drake is supported in the Bazel build, parallel to the CMake build.
  2. We provide sufficient tools that all Drake developers can function without CMake.
  3. We define a lib_drake Bazel target that includes all of C++ Drake, and its headers, and the externals.
  4. We extend lib_drake to emit a CMake config file, so that find_package(drake) is possible, and wrap up the whole distribution in a zipfile.
  5. We add the Drake Bazel build to the Drake CMake superbuild as a separate external project, off by default and not used for anything. The superbuild sets debug/release/san options on the bazel command line as appropriate.
  6. In one PR, we switch from CMake to Bazel in Drake proper. a. We turn the Drake Bazel build on by default in the CMake superbuild. b. The Drake CMakeLists.txt are gutted of all references to library source code. Instead, the existing libraries, executables, and tests will link against the results of find_package(drake).
  7. We drop the unit test source code from the CMakeLists, and instead just have ctest run tests out of bazel-bin. We delete the CMakeLists rules for executables, except perhaps for an example showing how to intake Drake via CMake.
  8. We remove all the externals from the Drake CMake superbuild, except for Director.
  9. Once CDash or another dashboard tool is able to handle Bazel outputs, we stop running unit tests in ctest at all. (speculative)
liangfok commented 7 years ago

I'd like to add some notes to the above migration plan on how Drake+ROS will evolve as Drake transitions to being built by Bazel. For example, will we have a Catkin package whose CMakeLists.txt uses execute_process() to run Bazel to build Drake and produce a lib_drake and a CMake config file? Will lib_drake and its CMake config file then need to be moved to a known location so other ROS packages can find and use them? (I'm assuming lib_drake and its CMake config file will be located in Bazel's sandbox and thus not easily found by downstream Catkin / CMake-based packages.)

jamiesnape commented 7 years ago

Yes, use by Catkin is going to be tricky. Bazel and Catkin are polar opposites with how they use the environment of the system on which they are being built.

jamiesnape commented 7 years ago

We still also need to reconfigure CI to checkout to somewhere other than the root of the Jenkins workspace.

david-german-tri commented 7 years ago

It preserves the current CMake interface downstream.

@liangfok The exact same Drake CMake library targets that have always existed, will continue to exist. Does that suffice for the ROS/Catkin build? At a glance, it seems like it should.

liangfok commented 7 years ago

Oh, maybe I misunderstand the migration plan, specifically step 6b:

The Drake CMakeLists.txt are gutted of all references to library source code. Instead, the existing libraries, executables, and tests will link against the results of find_package(drake).

Does this mean that ROS / Cakin no longer build's Drake, but rather treats Drake as a preexisting library that's already installed in the system?

jamiesnape commented 7 years ago

Seems so, unfortunately. Not sure Catkin will be happy.

wjwwood commented 7 years ago

Catkin is basically just CMake, so if a simple, pure-CMake example can be provided that "utilizes" drake (regardless of how drake was built, with either cmake or bazel) then you can make it work with other catkin packages. If you're talking about building drake along with many other things at the same time with something like catkin_tools, then we'd have to look at how bazel is exposed to other non-bazel libraries, i.e. how is it distributed. I can help with this as needed (I originally wrote catkin_tools) because I'd like to understand better how we could integrate with bazel projects.

jamiesnape commented 7 years ago

Jenkins TODO

Mac:

Linux Xenial:

Possibles:

jamiesnape commented 7 years ago

Installation TODO

Also

jamiesnape commented 7 years ago

CMake Removal TODO

drake-superbuild:

externals:

drake:

jamiesnape commented 7 years ago

Three proposals regarding CI for PRs:

  1. Switch LSan builds from CMake to Bazel.
  2. Switch off standalone cpplint builds (or replace with a Bazel one, but lint runs on all Bazel builds currently, anyway).
  3. Switch on at least one "everything" Bazel build.
mwoehlke-kitware commented 7 years ago

I was thinking that a Bazel lint-only build might be useful if we could also switch off the lint tests on the other Bazel CI builds. This would make it easier to tell lint failures from "real" failures.

...Just a thought; feel free to hate it :smile:.

jamiesnape commented 7 years ago

At the moment switching off lint gets ugly because --test_tag_filters do not accumulate intelligently.

jwnimmer-tri commented 7 years ago

To me, its important that CI builds match user builds, so we shouldn't add or remove tests in CI. The build variants should merely cover the matrix of supported platforms and documented command-line flags (such as --compiler=.)

jamiesnape commented 7 years ago

Any opinion on the three proposals, and do you have a preference for the particular "everything" build? We can add others once CMake is turned down.

jwnimmer-tri commented 7 years ago

Switch LSan builds from CMake to Bazel.

Sure.

Switch off standalone cpplint builds

Sure.

Switch on at least one "everything" Bazel build.

Anything Xenial & Everything would be fine by me.

jwnimmer-tri commented 6 years ago

The checklists upthread probably need a refresh at some point.

jamiesnape commented 6 years ago

We could probably split off the remaining Jenkins TODOs into separate issues and close this, I think. Other than those, I just see hermetic Gurobi remaining, and that does not see worth keeping the issue open for now (or necessarily even bothering about).

jwnimmer-tri commented 6 years ago

I agree we don't need an issue tracking Gurobi stuff. If its a problem in some context, we can fix it if / when needed.

RussTedrake commented 6 years ago

@jamiesnape, @jwnimmer-tri -- i think it's safe to close this? ;-)