Closed jwnimmer-tri closed 6 years ago
I'm in.
Most of your problems are due to legacy code and the unusual PODs conventions therein. If you rewrote the build system from scratch in any language it would be an improvement.
Some critical features that I'm not seeing offhand in Bazel:
To fully take advantage of it, we'd probably also have to port all of our dependencies.
Multiple build systems are a bad idea.
This is a compelling argument against attempting a port. Supporting two systems in parallel during the transition will be somewhat painful. We can perhaps localize the pain to only the bazel-porters (i.e., have a separate CI that is non-authoritative, and nobody but the porters care about).
Having a good plan of action here would be an important part of this pursuit (and I don't have one yet).
CI system would need to be completely refactored.
I'm not too worried about this. It will be effort and cost, but not risk.
Adding features to CMake that you may need in future is easy, for Bazel?
As with SCons, Bazel lets you escape into (limited) Python if you really want to, and is open-source should you need to modify the actual core.
Most of your problems are due to legacy code and the unusual PODs conventions therein. If you rewrote the build system from scratch in any language it would be an improvement.
I agree that's a big part of the problem, and that nuking the current build system code with a rewrite is a big win. Still, if we stick with CMake, the lack of reproducibility and caching for builds and tests is a big hole CMake, that Bazel and SCons both fill.
[Lacks the] Ability to locate and use external libraries. ("Use" seems to be there, at least partly, but "locate" is missing.)
Bazel often tends towards "build your libraries from source" (as we do with drake externals), which actually helps reproducibility, but you can also get required libraries from the system without much trouble.
And for "locate" (searching), I actually think that's a misfeature. There should be one way to build Drake-the-entireity on a given platform, which means that the dependency will be in one well-known, hard-coded place. Rooting around in system paths, the user's homedir, and finding something that "smells like" the right version of a library or tool only leads to confusing bug reports and wasted time. Or if the developer really wants a different path to some library, they can update their Bazel file to explicitly reference it.
[Lacks the] Ability to install things.
Hmm? Are you saying we'd have to roll your own "zip up these files into a release" logic, but in the CMake case that's already built-in? That's fair, but doesn't seem like a huge difference, compared to the effort of ongoing build system upkeep and developer downtime.
[Lacks] Any Windows support at all. (It looks like Bazel may assume a GCC-like compiler.)
This is in progress (http://bazel.io/roadmap.html), scheduled for a couple months out. Probably slips a bit more, but still relatively near-term compared to the scope of this ticket.
To fully take advantage of it, we'd probably also have to port all of our dependencies.
Yes, this is worth a bit of effort-assessment. Dependencies that are just a pile of C++ code are easy to port (I've done it). Dependencies that have bespoke -DTHIS_AND_THAT
autoconf-like logic are harder, but there are ways to cope.
FYI, active development prototype is ongoing here https://github.com/jwnimmer-tri/drake/commits/bazelspike. It is able to build the automotive slice of the code.
The current "off the top of my head" plan is something like "allow BUILD
files to be PR'd in-tree without CI yet" as we bring this up. This lets other bazel
workspaces use Drake-as-a-library, without Drake-as-a-project needing to adopt bazel
for its demos, tests, etc. Probably the next step after that is to get some kind of CI for Drake-as-a-library using bazel
, but still rely on CMake
for tests, demos, matlab, etc. I plan to iterate with David's review on a real plan offline, and then post here for comments.
Next-up proposal... for C++ libraries that bazel
knows about, teach CMake to obtain the list of sources from the BUILD
file, instead of repeating the list in two places.
WIP at https://github.com/jwnimmer-tri/drake/tree/bazel-reuse2
Not sure that is going to work. I am trying to work out if you would get extra cmake re-configures when file lists have not changed or not enough re-configures when they do change.
This is all kind of reversed when the purpose of CMake is to generate the build files for other systems. There is no bazel support now, but there could be.
When we discussed Bazel vs CMake a few months ago during the visit, the Kitware consensus in the room at that time was that Bazel is higher-level than CMake, and the one-feeds-the-other direction should go as I've done here.
In any case, what is the best solution in CMake for a list of library sources to be dynamically computed? I had done https://github.com/RobotLocomotion/drake/commit/6a1a04211cdb5b5c95dc011683efacacf079a43c originally -- which asks a python program to emit a list of sources -- but the approach in the reuse2
branch seemed like it would scale to generating even more of the listfile content directly from the BUILD.
The other option I've considered is to fully omit the listfiles for core Drake (common, math, systems, solvers, etc.) and just have CMake delegate to bazel to build the core. That is probably better long term, but I was hoping to get there incrementally.
That would certainly be cleanest.
Then CMake would provide an external interface for CMake consumers, build CMake externals, and other legacy code. We have integrated with external tools like Ant and Gradle for Java before, so this may not be that different.
(Might be an idea to create a ticket for the CMake side and assign it to Kitware since we have done similar many times before.)
Yeah. I have to give more thought and discussion to the optimal path forward. For now perhaps I'll just keep the two spellings of the list-of-sources duplicated.
Sure. My general opinion is that whatever the merits of Bazel or CMake, having two build systems duplicating each other or working in unusual ways to accommodate each other is going to cause a lot of extra maintenance at best, confusion or bugs at the worst. I also think there is some scope for upstreaming some useful Bazel support to CMake which might make things easier.
Starting from the wonderful list in #4045, here's a checklist of our C++ externals, to be ticked off once Bazel support for them is complete for Drake's use. I will keep this updated as we make progress.
bot_core_lcmtypes
#5056bullet
#4394ccd
#4600director
~ (the current plan is to keep this out of Bazel entirely)drake-visualizer
#5621, PR #6259dreal
#5406eigen
#3819fcl
#4600gflags
google_styleguide
#4670googletest
gurobi
#4518 (linux) #5518 (osx)ipopt
#4905 #4983lcm
libbot
#5405mosek
#5383nlopt
#5407octomap
#4600pybind11
#5700robotlocomotion_lcmtypes
#5056snopt
#5020spdlog
#4039vtk
#6152, #6308yaml_cpp
#4494Relatedly, here's a checklist of tooling needs, to be ticked off for Bazel support. I will keep this updated as we make progress:
The goal is only that they can be used without having to run CMake first. Things like, e.g. doxygen don't necessarily have to be a Bazel target -- a shell script called by CMake than can also be called manually would be fine.
Here's a checklist of things that the Bazel build accesses non-hermetically, which should be made hermetic.
Libraries
Binaries
Here is a migration plan from CMake to Bazel, extracted from f2f conversation with myself, @jwnimmer-tri, and @jamiesnape. It preserves the current CMake interface downstream.
lib_drake
Bazel target that includes all of C++ Drake, and its headers, and the externals.lib_drake
to emit a CMake config file, so that find_package(drake)
is possible, and wrap up the whole distribution in a zipfile.bazel
command line as appropriate.find_package(drake)
.CMakeLists
, and instead just have ctest run tests out of bazel-bin
. We delete the CMakeLists
rules for executables, except perhaps for an example showing how to intake Drake via CMake.I'd like to add some notes to the above migration plan on how Drake+ROS will evolve as Drake transitions to being built by Bazel. For example, will we have a Catkin package whose CMakeLists.txt uses execute_process() to run Bazel to build Drake and produce a lib_drake
and a CMake config file? Will lib_drake
and its CMake config file then need to be moved to a known location so other ROS packages can find and use them? (I'm assuming lib_drake
and its CMake config file will be located in Bazel's sandbox and thus not easily found by downstream Catkin / CMake-based packages.)
Yes, use by Catkin is going to be tricky. Bazel and Catkin are polar opposites with how they use the environment of the system on which they are being built.
We still also need to reconfigure CI to checkout to somewhere other than the root of the Jenkins workspace.
It preserves the current CMake interface downstream.
@liangfok The exact same Drake CMake library targets that have always existed, will continue to exist. Does that suffice for the ROS/Catkin build? At a glance, it seems like it should.
Oh, maybe I misunderstand the migration plan, specifically step 6b:
The Drake CMakeLists.txt are gutted of all references to library source code. Instead, the existing libraries, executables, and tests will link against the results of find_package(drake).
Does this mean that ROS / Cakin no longer build's Drake, but rather treats Drake as a preexisting library that's already installed in the system?
Seems so, unfortunately. Not sure Catkin will be happy.
Catkin is basically just CMake, so if a simple, pure-CMake example can be provided that "utilizes" drake (regardless of how drake was built, with either cmake or bazel) then you can make it work with other catkin packages. If you're talking about building drake along with many other things at the same time with something like catkin_tools
, then we'd have to look at how bazel is exposed to other non-bazel libraries, i.e. how is it distributed. I can help with this as needed (I originally wrote catkin_tools
) because I'd like to understand better how we could integrate with bazel projects.
Jenkins TODO
Mac:
Linux Xenial:
Possibles:
Installation TODO
Also
CMake Removal TODO
drake-superbuild:
externals:
drake:
Three proposals regarding CI for PRs:
I was thinking that a Bazel lint-only build might be useful if we could also switch off the lint tests on the other Bazel CI builds. This would make it easier to tell lint failures from "real" failures.
...Just a thought; feel free to hate it :smile:.
At the moment switching off lint gets ugly because --test_tag_filters
do not accumulate intelligently.
To me, its important that CI builds match user builds, so we shouldn't add or remove tests in CI. The build variants should merely cover the matrix of supported platforms and documented command-line flags (such as --compiler=
.)
Any opinion on the three proposals, and do you have a preference for the particular "everything" build? We can add others once CMake is turned down.
Switch LSan builds from CMake to Bazel.
Sure.
Switch off standalone cpplint builds
Sure.
Switch on at least one "everything" Bazel build.
Anything Xenial & Everything would be fine by me.
The checklists upthread probably need a refresh at some point.
We could probably split off the remaining Jenkins TODOs into separate issues and close this, I think. Other than those, I just see hermetic Gurobi remaining, and that does not see worth keeping the issue open for now (or necessarily even bothering about).
I agree we don't need an issue tracking Gurobi stuff. If its a problem in some context, we can fix it if / when needed.
@jamiesnape, @jwnimmer-tri -- i think it's safe to close this? ;-)
The Bazel build system is a relatively new tool that provides correct, reproducible, fast builds. It is the open-source version of google's own build tool.
I believe that Bazel would solve many of TRI's challenges in using and supporting Drake, and thus we should experiment with adding Bazel support into Drake. For starters, it should be enough to merely add Bazel support for the newer C++ core of Drake (common, math, systems_framework).
This would start as workstation-only test, then graduate to a post-merge build only, and not in the officially supported set. If and when we demonstrate that it is reliable and useful for developers, we can consider making Bazel support official.
It would be nice to start on this soon, in order to help prepare and test the upcoming Bazel 0.5 official support for Windows in a few months, so that we can help drive it to meet our needs.