Closed mwoehlke-kitware closed 6 years ago
Thanks! This analysis is really helpful. It's a pain point for us too, but as far as I know no one is looking at it yet, so I'll provisionally assign myself.
@liangfok, you may also be interested.
What does "joint classes" mean in this context?
+1 for a redesign of the URDF / SDF parsers. I propose a RigiBodyTree
class that contains no parsers, and separate DrakeURDFParser
and DrakeSDFParser
classes that inherent from an abstract ModelParser
class.
Thanks, @david-german-tri; it's good to know it's not just me. One of the reasons I wanted to open an issue is because I'm not sure what our minimum system requirements are for building drake, and I could imagine some "average" user with an underpowered machine getting bitten by this.
What does "joint classes" mean in this context?
Subclasses of DrakeJoint
. In particular, see those classes defined via includes in drake/systems/plants/joints/DrakeJoints.h
.
I have a (very stale) branch (REBASE-rbt-split-templates
in my fork) that splits RigidBodyTree.cpp
into several pieces, with the worst needing only about 20 sec, 1.5 GiB to compile (most are about 10 sec, 1.0 GiB). Despite that this takes non-trivially more CPU cycles in total to compile, it's nearly a wash on higher-end machines, and does actually help on mid-range machines. (As an added bonus, the template instantiations are also much more legible.) However, it is not possible to do anything similar for RigidBodyTree{SDF,URDF}.cpp
. The problem there seems to be with instantiating the various joint classes, which is why I'm not aware that anything can be done except to redesign those somehow so as to avoid the problem. (This is why I opened an issue rather than a PR; I don't have such a redesign available and wouldn't presume to be the best person to attempt such a change.) I expect that would also help RigidBodyTree.cpp
considerably.
That said, I'd be happy to look at redoing the changes to split RigidBodyTree.cpp
if you feel that would be valuable.
On Windows with VS 2015 RelWithDebInfo I'm seeing about 1.5GB in use when compiling those files. That's much bigger than a typical compile but still reasonable.
On a related note... I tried to do a build last Friday, and noticed 8 compile tasks at about 1.5 GiB each. Unfortunately, this broke my computer :cry: (read: caused it to become entirely non-responsive, and it did not recover after being left along over the weekend), so I can't do a postmortem to determine what it was trying to build at the time.
The compiler grinding to a halt on these files is really starting to annoy me, too. If there's a patch nearby to break out some code into separate files, that'd be a good start. Once that's in, we could try to recruit someone at TRI to tackle the compiler-bogging problem with drake/systems/plants/joints
code.
Perhaps we should also see if this compiler flag can be removed, once this issue is fixed:
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /bigobj") # after receiving RigidBodyTree.cpp : fatal error C1128: number of sections exceeded object file format limit: compile with /bigobj
This has generated a bunch of complaints recently, and is a barrier to switching to Ninja, so I'm bumping it to priority: high
and will start working on it instead of System2 for the next couple days.
The @mwoehlke-kitware WIP to split RigidBodyTree into pieces is here: https://github.com/mwoehlke-kitware/drake/commit/e7717a062366b81d3b400dddc863c77ce56151f4
@david-german-tri Thanks for this! Especially #2614 was a huge difference. Even though more improvements are possible (and should continue), I wonder if we can call this ticket closed, or at least lower its priority? I haven't needed more than even 8 GB during -j8
builds since these changes.
There are two related issues that I've been working under the aegis of this ticket:
RigidBodyTree
compilation units named in the OP are huge.RigidBody*
headers pull a large amount of template code into hundreds of downstream compilation units that include them.I agree with you that good progress has been made on item (2), but for item (1) we're still in the earlier stages. I think breaking the dependencies on drakeGeometryUtil
will make a huge difference. So, I'd like to keep this ticket open until that's done, but I've dropped the priority to medium.
What about the plan to extract the URDF / SDF parsing code from RigidBodyTree
? Is that going to part of this issue or should that be a separate issue?
I'm not planning to factor out URDF/SDF parsing as part of this issue. It's a great idea for many reasons, but I'm not sure it will make a big difference to memory footprint (since the parsers are already separate compilation units).
So, I retract my proposal to close this, and my high watermwark evidence of 8G upthread. It turns out that when I switched to Ninja, I stopped compiling in Release mode (CMAKE_BUILD_TYPE was undefined). Having switched back to release mode now, memory is topping out above 16GB even under -j4
again.
We've gone backwards here somewhere...
joints/RollPitchYawFloatingJoint.cpp 68 sec, 3.4 GiB
parser_urdf.cc 109 sec, 3.8 GiB
RigidBodyTreeSDF.cpp 108 sec, 4.5 GiB
RigidBodyTree.cpp 268 sec, 7.3 GiB
(This is from stats collected from the entire build¹, on Linux using GCC 4.9.2 in RelWithDebInfo
mode. The good news is that the next worst offender is examples/Quadrotor/runLQR.cpp
at a little under 2 GiB and 41 seconds, after which nothing exceeds 1.5 GiB or 31 seconds.)
(¹ Of drake itself. Stats for the externals were not collected.)
For grins, here's a plot (log scale) of build times to memory usage. Most of the build is in a reasonable region between 1-15 sec and about 150 KiB - 1 GiB.
$ cd drake-distro
$ rm -rf build
$ rm -rf externals
$ git reset --hard HEAD
$ mkdir build
$ cd build
$ cmake .. -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo -DDISABLE_MATLAB=TRUE -DWITH_DREAL=FALSE -DWITH_SNOPT=FALSE
$ time make -j4
Notice that I disable both MATLAB and dReal. I use -j4
since the CPU has four cores.
0442d21362dbd23b326238f8d190080f0aae248f
$ time make -j4
...
real 11m17.079s
user 34m42.356s
sys 2m25.955s
Since the CPU supports hyper-threading, I decided to try using -j8
. Here are the results:
real 10m48.138s
user 52m32.926s
sys 3m19.787s
I believe the only metric that really matters is "real" (see this) article. On this particular machine, using -j8
marginally reduces the "real" build time by about 30 seconds.
$ time ninja -j4
...
real 10m37.357s
user 49m45.990s
sys 2m41.122s
$ time ninja -j8
...
real 10m41.235s
user 50m15.326s
sys 2m40.993s
@mwoehlke-kitware: Interesting. I think that shows we've gone backwards in two respects:
DrakeJoints.h
(and in particular RollPitchYawFloatingJoint.h
) pull in a huge amount of template code. We've factored compilation units that include DrakeJoints.h
into multiple pieces. However, as @liangfok's metrics point out, and indeed as you suggested upthread, that refactoring doesn't have much effect on user-perceived build times for multicore systems with 16GB or more of RAM. RigidBodyTree and the parsers are a dependency chokepoint for the entire build, so when we hit these compilation-mega-units, there are plenty of cores to spare.RigidBodyTree.cpp
somehow picked up another 3+ GB of RAM usage. That's insane and alarming; it should be bisected for root cause.Right now, this issue is assigned to me, but I don't have bandwidth to work on it. Someone else is welcome to chip in.
I'll take on the investigation of why RigidBodyTree
takes so much memory to compile since it is very likely my fault. My current suspicion is that it happened when I extracted the parser code into their own .h
and .cc
files.
Update Sept. 4, 2016: The high memory consumption problem was not caused by the extraction of the parsers. See update 3 below.
Using 0b1910cb7e0b8f0c0f144abe128061845e63e29d (September 1, 2016)
I wanted to replicate what @mwoehlke-kitware reported in the original description of this issue. To do this, I first built Drake using VERBOSE=true
to get the actual compile commands. I then found the command for compiling RigidBodyTree.cpp
.
To manually build RigidBodytree.cpp
, I executed the following commands:
$ cd /home/liang/dev/drake-distro-2/build/drake/systems/plants
$ rm CMakeFiles/drakeRBM.dir/RigidBody.cpp.o
$ /usr/bin/time --format=%U,%M /usr/bin/g++-4.9 -DHAVE_SPDLOG -DdrakeRBM_EXPORTS -I/home/liang/dev/drake-distro-2/drake/.. -I/home/liang/dev/drake-distro-2/build/drake/exports -I/home/liang/dev/drake-distro-2/build/install/include -I/home/liang/dev/drake-distro-2/build/drake -I/home/liang/dev/drake-distro-2/build/drake/lcmtypes -isystem /home/liang/dev/drake-distro-2/build/install/include/eigen3 -I/home/liang/dev/drake-distro-2/drake/thirdParty/bsd/spruce/include -Werror=all -Werror=ignored-qualifiers -DGTEST_DONT_DEFINE_FAIL=1 -DGTEST_DONT_DEFINE_SUCCEED=1 -DGTEST_DONT_DEFINE_TEST=1 -O2 -g -DNDEBUG -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -std=gnu++14 -o CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o -c /home/liang/dev/drake-distro-2/drake/systems/plants/RigidBodyTree.cpp
Here's what the last command above reported:
618.33,7894972
The above numbers indicate it took 618.33 seconds (~10 minutes) and 7,894,972 kilobytes of memory. The build type was CMAKE_BUILD_TYPE:STRING=RelWithDebInfo
. The maximum amount of memory used (7.89GB) is far higher than what was reported above.
Using SHA a974831f0716fbcd7890b4fb6c0f2402bbb9acd0 (April 12, 2016):
$ cd /home/liang/dev/drake-distro-2/drake/pod-build/systems/plants
$ rm ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
$ /usr/bin/time --format=%U,%M /usr/bin/g++-4.9 -DdrakeRBM_EXPORTS -I/home/liang/dev/drake-distro-2/drake/pod-build/generated -I/home/liang/dev/drake-distro-2/drake/.. -I/home/liang/dev/drake-distro-2/drake/pod-build/exports -isystem /home/liang/dev/drake-distro-2/build/include -I/home/liang/dev/drake-distro-2/drake/pod-build/lcmgen -isystem /usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -isystem /home/liang/dev/drake-distro-2/build/include/eigen3 -I/home/liang/dev/drake-distro-2/drake/thirdParty/spruce/include -I/home/liang/dev/drake-distro-2/drake/thirdParty/cimg -Wreturn-type -Wuninitialized -Wunused-variable -std=c++11 -O2 -g -DNDEBUG -fPIC -o CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o -c /home/liang/dev/drake-distro-2/drake/systems/plants/RigidBodyTree.cpp
106.73,3907428
$ ls -lah ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
-rw-rw-r-- 1 liang liang 183M Sep 3 21:46 ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
So it's true. Compiling the April 12, 2016 version of RigidBodyTree.cpp
consumed 3.9GB of memory. This is much less than the 7.9GB of memory needed on September 1, 2016. It also closely matches @mwoehlke-kitware's measurements posted in this issue's description.
On July 6, 2016, https://github.com/RobotLocomotion/drake/issues/2074#issuecomment-230825388 pointed out that RigidBodyTree.cpp
regressed in terms of compiler memory footprint. The following tests a commit from July 7, 2016:
Using SHA 854a4589ebd5eb2a85f19fa4dd3bea854d2c9290 (July 7, 2016):
$ cd /home/liang/dev/drake-distro-2/drake/pod-build/systems/plants
$ rm ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
$ /usr/bin/time --format=%U,%M /usr/bin/g++-4.9 -DROSCONSOLE_BACKEND_LOG4CXX -DROS_PACKAGE_NAME=\"drake\" -DdrakeRBM_EXPORTS -I/home/liang/dev/drake-distro-2/drake/pod-build/generated -I/home/liang/dev/drake-distro-2/drake/.. -I/home/liang/dev/drake-distro-2/drake/pod-build/exports -I/home/liang/dev/drake-distro-2/build/include -I/opt/ros/indigo/include -I/home/liang/dev/drake-distro-2/drake/pod-build -I/home/liang/dev/drake-distro-2/drake/pod-build/lcmtypes -isystem /home/liang/dev/drake-distro-2/build/include/eigen3 -I/home/liang/dev/drake-distro-2/drake/thirdParty/spruce/include -I/home/liang/dev/drake-distro-2/drake/thirdParty/cimg -Werror=all -Werror=ignored-qualifiers -DGTEST_DONT_DEFINE_FAIL=1 -DGTEST_DONT_DEFINE_SUCCEED=1 -DGTEST_DONT_DEFINE_TEST=1 -Wno-sign-compare -O2 -g -DNDEBUG -fPIC -std=gnu++14 -o CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o -c /home/liang/dev/drake-distro-2/drake/systems/plants/RigidBodyTree.cpp
268.07,7577520
$ ls -lah ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
-rw-rw-r-- 1 liang liang 385M Sep 3 23:52 ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
These results confirm that on July 7, 2016, RigidBodyTree.cpp
required 7.6GB of RAM to compile. This is in contrast to April 12, 2016, where only 3.9GB was required.
Does the switch from C++11 to C++14 make a difference?
One change from April 12 to July 7 is the switch from C++11 to C++14. This occurred in dba30b30d09e7fd98441570fecc4fa7852a03e3b. The immediately preceding commit is b3028662c4c76d6a339bdff8b2dfde0fe4180203. Unfortunately, it does not compile due to lcm-lua
failing to find lua.h
. Thus, I test the commit that immediately precedes b3028662c4c76d6a339bdff8b2dfde0fe4180203, which is ea0fe6362cfa959a39a961055da7038eb3da8498.
$ cd /home/liang/dev/drake-distro-2/drake/pod-build/systems/plants
$ rm ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
$ /usr/bin/time --format=%U,%M /usr/bin/g++-4.9 -DROSCONSOLE_BACKEND_LOG4CXX -DROS_PACKAGE_NAME=\"drake\" -DdrakeRBM_EXPORTS -I/home/liang/dev/drake-distro-2/drake/pod-build/generated -I/home/liang/dev/drake-distro-2/drake/.. -I/home/liang/dev/drake-distro-2/drake/pod-build/exports -isystem /home/liang/dev/drake-distro-2/build/include -I/opt/ros/indigo/include -I/home/liang/dev/drake-distro-2/drake/pod-build/lcmgen -isystem /usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -isystem /home/liang/dev/drake-distro-2/build/include/eigen3 -I/home/liang/dev/drake-distro-2/drake/thirdParty/spruce/include -I/home/liang/dev/drake-distro-2/drake/thirdParty/cimg -Werror=all -Werror=ignored-qualifiers -DGTEST_DONT_DEFINE_FAIL=1 -DGTEST_DONT_DEFINE_SUCCEED=1 -DGTEST_DONT_DEFINE_TEST=1 -Wno-sign-compare -O2 -g -DNDEBUG -fPIC -std=gnu++11 -o CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o -c /home/liang/dev/drake-distro-2/drake/systems/plants/RigidBodyTree.cpp
250.36,7470296
$ ls -lah ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
-rw-rw-r-- 1 liang liang 385M Sep 4 13:55 ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
The results indicate that switching from C++11 to C++14 did not result in the increase in memory utilization. Version ea0fe6362cfa959a39a961055da7038eb3da8498 from June 28, 2016, requires 7.47GB of RAM to compile RigidBodyTree.cpp
. Note that since this version is prior to the extraction of the parser code into parser_urdf.cc
and parser_sdf.cc
, we now know that the memory problem was not introduced by the extraction of the parsers.
Arbitrarily select a SHA on June 1, 2016: 920bfcfe5b30bd30d27e378bf4194734a8bb28e7. This is an attempt to isolate the memory problem by bisecting the range of dates over which the problem must have been introduced.
$ cd /home/liang/dev/drake-distro-2/drake/pod-build/systems/plants
$ rm ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
$ /usr/bin/time --format=%U,%M /usr/bin/g++-4.9 -DROSCONSOLE_BACKEND_LOG4CXX -DROS_PACKAGE_NAME=\"drake\" -DdrakeRBM_EXPORTS -I/home/liang/dev/drake-distro-2/drake/pod-build/generated -I/home/liang/dev/drake-distro-2/drake/.. -I/home/liang/dev/drake-distro-2/drake/pod-build/exports -isystem /home/liang/dev/drake-distro-2/build/include -I/home/liang/dev/drake-distro-2/drake/pod-build/lcmgen -isystem /usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -isystem /home/liang/dev/drake-distro-2/build/include/eigen3 -I/opt/ros/indigo/include -I/home/liang/dev/drake-distro-2/drake/thirdParty/spruce/include -I/home/liang/dev/drake-distro-2/drake/thirdParty/cimg -Werror=all -Wno-sign-compare -DGTEST_DONT_DEFINE_FAIL=1 -DGTEST_DONT_DEFINE_SUCCEED=1 -DGTEST_DONT_DEFINE_TEST=1 -O2 -g -DNDEBUG -fPIC -std=gnu++11 -o CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o -c /home/liang/dev/drake-distro-2/drake/systems/plants/RigidBodyTree.cpp
320.11,7371368
$ ls -lah ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
-rw-rw-r-- 1 liang liang 380M Sep 4 15:33 ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
The excessive memory utilization problem existed prior to June 1, 2016. The results above show that on June 1, 2016, compiling RigidBodyTree.cpp
required 7.37GB of RAM. We now know that the problem arose sometime between April 12, 2016 and June 1, 2016.
Arbitrarily select a SHA on May 1, 2016: 729e64b2e4b03cb6fa9471b6aabf96415ef737a7. This is an attempt to isolate the memory problem by bisecting the range of dates over which the problem must have been introduced.
$ cd /home/liang/dev/drake-distro-2/drake/pod-build/systems/plants
$ rm ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
$ /usr/bin/time --format=%U,%M /usr/bin/g++-4.9 -DROSCONSOLE_BACKEND_LOG4CXX -DROS_PACKAGE_NAME=\"drake\" -DdrakeRBM_EXPORTS -I/home/liang/dev/drake-distro-2/drake/pod-build/generated -I/home/liang/dev/drake-distro-2/drake/.. -I/home/liang/dev/drake-distro-2/drake/pod-build/exports -isystem /home/liang/dev/drake-distro-2/build/include -I/home/liang/dev/drake-distro-2/drake/pod-build/lcmgen -isystem /usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -isystem /home/liang/dev/drake-distro-2/build/include/eigen3 -I/opt/ros/indigo/include -I/home/liang/dev/drake-distro-2/drake/thirdParty/spruce/include -I/home/liang/dev/drake-distro-2/drake/thirdParty/cimg -Wreturn-type -Wuninitialized -Wunused-variable -std=c++11 -O2 -g -DNDEBUG -fPIC -o CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o -c /home/liang/dev/drake-distro-2/drake/systems/plants/RigidBodyTree.cpp
255.19,7371684
$ ls -lah ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
-rw-rw-r-- 1 liang liang 380M Sep 4 18:18 ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
The excessive memory utilization problem existed prior to May 1, 2016. The results above show that on May 1, 2016, compiling RigidBodyTree.cpp
required 7.37GB of RAM. We now know that the problem arose sometime between April 12, 2016 and May 1, 2016.
Arbitrarily select a SHA on April 21, 2016: d6beee40827c327ba637297bb9ae891344f48321. This is an attempt to isolate the memory problem by bisecting the range of dates over which the problem must have been introduced.
$ cd /home/liang/dev/drake-distro-2/drake/pod-build/systems/plants
$ rm ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
$ /usr/bin/time --format=%U,%M /usr/bin/g++-4.9 -DROSCONSOLE_BACKEND_LOG4CXX -DROS_PACKAGE_NAME=\"drake\" -DdrakeRBM_EXPORTS -I/home/liang/dev/drake-distro-2/drake/pod-build/generated -I/home/liang/dev/drake-distro-2/drake/.. -I/home/liang/dev/drake-distro-2/drake/pod-build/exports -isystem /home/liang/dev/drake-distro-2/build/include -I/home/liang/dev/drake-distro-2/drake/pod-build/lcmgen -isystem /usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -isystem /home/liang/dev/drake-distro-2/build/include/eigen3 -I/opt/ros/indigo/include -I/home/liang/dev/drake-distro-2/drake/thirdParty/spruce/include -I/home/liang/dev/drake-distro-2/drake/thirdParty/cimg -Wreturn-type -Wuninitialized -Wunused-variable -std=c++11 -O2 -g -DNDEBUG -fPIC -o CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o -c /home/liang/dev/drake-distro-2/drake/systems/plants/RigidBodyTree.cpp
289.13,7371724
$ ls -lah ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
-rw-rw-r-- 1 liang liang 380M Sep 4 20:19 ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
The excessive memory utilization problem existed prior to April 21, 2016. The results above show that on April 21, 2016, compiling RigidBodyTree.cpp
required 7.37GB of RAM. We now know that the problem arose sometime between April 12, 2016 and April 21, 2016.
Arbitrarily select a SHA on April 18, 2016: c678bc7373bf69639503288191e1139d49c153a5. This is an attempt to isolate the memory problem by bisecting the range of dates over which the problem must have been introduced.
$ cd /home/liang/dev/drake-distro-2/drake/pod-build/systems/plants
$ rm ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
$ /usr/bin/time --format=%U,%M /usr/bin/g++-4.9 -DdrakeRBM_EXPORTS -I/home/liang/dev/drake-distro-2/drake/pod-build/generated -I/home/liang/dev/drake-distro-2/drake/.. -I/home/liang/dev/drake-distro-2/drake/pod-build/exports -isystem /home/liang/dev/drake-distro-2/build/include -I/home/liang/dev/drake-distro-2/drake/pod-build/lcmgen -isystem /usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -isystem /home/liang/dev/drake-distro-2/build/include/eigen3 -I/home/liang/dev/drake-distro-2/drake/thirdParty/spruce/include -I/home/liang/dev/drake-distro-2/drake/thirdParty/cimg -Wreturn-type -Wuninitialized -Wunused-variable -std=c++11 -O2 -g -DNDEBUG -fPIC -o CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o -c /home/liang/dev/drake-distro-2/drake/systems/plants/RigidBodyTree.cpp
101.84,3906988
$ ls -lah ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
-rw-rw-r-- 1 liang liang 183M Sep 4 23:27 ./CMakeFiles/drakeRBM.dir/RigidBodyTree.cpp.o
The excessive memory utilization problem did not exist prior to April 18, 2016. The results above show that on April 18, 2016, compiling RigidBodyTree.cpp
required 3.9GB of RAM. We now know that the problem arose sometime between April 18, 2016 and April 21, 2016.
Comparing the non-problematic version on April 18, 2016 (c678bc7373bf69639503288191e1139d49c153a5) with the problematic version on April 21, 2016 (d6beee40827c327ba637297bb9ae891344f48321), I believe I found the problem. The screenshot below shows a diff of RigidBodyTree.h
in the two versions. The non-problematic version is on the left while the problematic version is on the right. Notice that the problematic version includes DrakeJoints.h
:
This header file was included due to the addition of RigidBodyTree::AddFloatingJoint()
.
I suspect the inclusion of DrakeJoints.h
in RigidBodyTree.h
results in the much higher memory footprint while compiling RigidBodyTree.h
.
Note that @david-german-tri modified RigidBodyTree.h
to include DrakeJoint.h
instead of DrakeJoints.h
on June 21, 2016 (https://github.com/RobotLocomotion/drake/commit/9fabbc1a1e93751d8512c023fc3c04a7c08bc437). Update 3 above tested a version of Drake after this optimization and it still took > 7GB of RAM to build RigidBodyTree.cpp
.
I extracted RigidBodyTree::AddFloatingJoint()
and the floating base types into their own .h
and .cc
files. See: https://github.com/liangfok/drake/tree/feature/extract_joint_types_and_add_floating_joints.
Lenovo T430 laptop with an Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, and 16GB DDR3 1600 MHz RAM, running Ubuntu 14.04.
The overall benchmark results of https://github.com/liangfok/drake/commit/51e481dea44aecb13d2ba4547911a6a4b73fe53e show building Drake takes 1:03:51 of wall clock time to build, 10,344 user mode CPU seconds (2.85 user mode CPU hours), and 5.1 GB of RAM.
$ cd drake-distro
$ rm -rf build
$ rm -rf externals
$ git reset --hard HEAD
$ mkdir build
$ cd build
$ cmake .. -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo -DDISABLE_MATLAB=TRUE -DWITH_DREAL=FALSE -DWITH_SNOPT=FALSE
$ /usr/bin/time --format=%E,%U,%M make -j4
1:03:51,10344.23,5124420
$ du -ch | grep total
6.1G total
Building 51e481dea44aecb13d2ba4547911a6a4b73fe53e again, the total RAM footprint is 5.1GB. This matches the previous test and shows some level of consistency. The wall clock time varies quite a lot.
$ /usr/bin/time --format=%E,%U,%M make -j4
47:47.54,8183.17,5114984
Using the current head of master (0b1910cb7e0b8f0c0f144abe128061845e63e29d), building Drake took 59:45.99 wall clock time, 9,080 user mode CPU seconds (2.52 hours), and 7.9GB of RAM:
$ cd drake-distro
$ rm -rf build
$ rm -rf externals
$ git reset --hard HEAD
$ mkdir build
$ cd build
$ cmake .. -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo -DDISABLE_MATLAB=TRUE -DWITH_DREAL=FALSE -DWITH_SNOPT=FALSE
$ /usr/bin/time --format=%E,%U,%M make -j4
59:45.99,9080.09,7900176
$ du -ch | grep total
6.1G total
Building the master (0b1910cb7e0b8f0c0f144abe128061845e63e29d) again, it took 1:07:58 of wall clock time, 10,221 user mode CPU seconds (2.8 hours), and 7.9GB of RAM.
$ /usr/bin/time --format=%E,%U,%M make -j4
...
1:07:58,10221.44,7900800
The total wall-clock time and CPU time are nearly identical at about an hour. However, the memory utilization is far less in https://github.com/liangfok/drake/commit/51e481dea44aecb13d2ba4547911a6a4b73fe53e (5.1 GB of RAM) versus the latest on master, which is 0b1910cb7e0b8f0c0f144abe128061845e63e29d (7.9GB of RAM).
Great analysis and results! I'd like to be a reviewer of this PR. As a preview, here are my top two comments/questions from skimming your branch.
AddFloatingJoint
belongs to drake::parsers
? That approach (a) is semantically weird if we ever need to add a floating joint outside a parser and (b) creates a bunch of new dependencies on RigidBodyTree public data. Another option, which I also don't love, would be to leave AddFloatingJoint as a member of RigidBodyTree, and just move the implementation to a separate .cc file. Maybe there is a third option?drake::systems::joints::kFixed
.I use my HP z460 workstation to compare the latest head of master with the extraction of DrakeJoint::FloatingBaseType
and RigidBodyTree::AddFloatingJoints()
into their own compilation units.
Since the workstation as 12 cores, I use -j12
.
$ cd drake-distro
$ rm -rf build
$ rm -rf externals
$ git reset --hard HEAD
$ mkdir build
$ cd build
$ cmake .. -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo -DDISABLE_MATLAB=TRUE -DWITH_DREAL=FALSE -DWITH_SNOPT=FALSE
$ /usr/bin/time --format=%E,%U,%M make -j12
$ du -ch | grep total
Using the latest on master (8e6c585c73e573133a3cf76b02972e5c00ab433d):
$ /usr/bin/time --format=%E,%U,%M make -j12
16:29.95,4813.20,7903908
$ du -ch | grep total
6.1G total
Using the optimized branch (https://github.com/liangfok/drake/commit/1bff65ce81aa9fe608eb23a87df46380e392a5e3):
$ /usr/bin/time --format=%E,%U,%M make -j12
13:13.46,4788.78,5097876
$ du -ch | grep total
6.1G total
Extracting DrakeJoint::FloatingBaseType
and RigidBodyTree::AddFloatingJoint()
into their own compilation units decreased build times from ~16 minutes to ~13 minutes and, more importantly, decreased the maximum memory footprint to be from 7.9 GB to 5.1 GB.
Using the same workstation as mentioned above and 7f48705eaca4215d43cd90efa3e710725aecbaab:
$ cd drake-distro/build
$ cmake .. -G Ninja -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo -DWITH_DREAL=FALSE -DDISABLE_MATLAB=TRUE
$ /usr/bin/time --format=%E,%U,%M ninja -j24
12:11.92,3777.46,3205664
$ du -ch | grep total
6.5G total
Wow, ninja
+ clang
uses significantly less memory than make + gcc
(3.2GB vs. 5.1G). ninja
+ clang
is only about a minute faster (12 minutes vs. 13 minutes).
Are we really sure that AddFloatingJoint belongs to drake::parsers? That approach (a) is semantically weird if we ever need to add a floating joint outside a parser and (b) creates a bunch of new dependencies on RigidBodyTree public data. Another option, which I also don't love, would be to leave AddFloatingJoint as a member of RigidBodyTree, and just move the implementation to a separate .cc file. Maybe there is a third option?
Yes, I believe RigidBodyTree::AddFloatingJoint()
should be part of parsers since we only anticipate it ever being called by the parsers. Recall that it was originally added as a way to connect newly added models to an existing RigidBodyTree
. It does this by searching through all bodies in the tree for those that are parent-less, and adding floating joints to them. This can be thought of as a hack that was needed based on the limitations of the parsers (At the time, parsers couldn't keep track of which models they were adding. Now, with the introduction of the model_instance_id
, this is no longer the case.). Longer-term, I expect the parsers to be able to automatically add the floating joints as it is parsing the model and adding a model instance, meaning this method will no longer be necessary.
Note that even if we remove this method from RigidBodyTree
, we can continue to programmatically add floating joints using a combination of the following two methods:
Regarding the concern about introducing new dependencies, my longer term plan is to pull the parsers into their own library, which is then linked against the existing drakeRBM
library. In other words, users will interact directly with the parsers rather than go through the RigidBodySystem
and RigidBodyTree
to add models to them.
Factoring out the joint types into a separate header is a great idea. It would help readability to also format and namespace them properly in this PR, e.g. drake::systems::joints::kFixed.
Yeah, I agree. In the spirit of incremental PRs, however, I will probably not initially namespace the floating base types. I do have another WIP branch that adds name spaces. I'll submit that PR after the initial PR that brings down the memory footprint.
BTW, for another approach to adding floating joints see Simbody's MultibodyGraphMaker
class which is independent of the parser and independent of the multibody tree. It is structured as a utility that absorbs body and joint information (as obtained by a parser typically) and then spits out a spanning tree plus loop constraints design for building the multibody tree, including any needed floating joints. We could consider that more flexible approach at some point, although I agree that Liang's proposal is an improvement.
$ cmake .. -G Ninja -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo -DDISABLE_MATLAB=TRUE
$ /usr/bin/time --format=%E,%U,%M ninja
3af99f19790b2341cc4901fa871cacd6d14c7634
18:34.81,5875.01,3223556
Building Drake took about 18.5 wall clock minutes, 1.63 CPU hours, and 3.2GB of RAM.
So here's a good way to profile the build:
$ bazel build --profile profile.bin //...
$ bazel analyze-profile profile.bin --dump=raw > profile.csv
$ grep 'ACTION_EXECUTE.*Compiling' profile.csv |
sort -t\| -k5,5rn | head -n40 | cut -d\| -f5,8 |
perl -pe 's/^(\d+)\d{9}\|/\1s /;' |
awk '{if ((NR-1) % 2 ==0) print}'
For me with default Bazel config (GCC, release), it blames:
177s Compiling drake/multibody/rigid_body_tree.cc
116s Compiling drake/multibody/parsers/sdf_parser.cc
107s Compiling drake/multibody/parsers/urdf_parser.cc
95s Compiling drake/multibody/parsers/parser_common.cc
85s Compiling drake/multibody/test/rigid_body_tree/rigid_body_collision_clique_test.cc
64s Compiling drake/multibody/collision/test/collision_filter_group_test.cc
53s Compiling drake/multibody/joints/roll_pitch_yaw_floating_joint.cc
49s Compiling drake/multibody/joints/roll_pitch_yaw_floating_joint.cc
45s Compiling drake/common/test/symbolic_mixing_scalar_types_test.cc
36s Compiling drake/multibody/constraint/rigid_body_constraint.cc
35s Compiling drake/systems/analysis/test/runge_kutta3_integrator_test.cc
34s Compiling drake/multibody/rigid_body_plant/test/compute_contact_result_test.cc
31s Compiling drake/solvers/test/optimization_examples.cc
30s Compiling drake/solvers/test/optimization_examples.cc
28s Compiling drake/multibody/joints/quaternion_floating_joint.cc
26s Compiling drake/systems/framework/test/diagram_test.cc
26s Compiling drake/systems/analysis/test/simulator_test.cc
24s Compiling drake/systems/framework/test/diagram_builder_test.cc
24s Compiling drake/systems/controllers/test/pid_controlled_system_test.cc
23s Compiling drake/multibody/joints/quaternion_floating_joint.cc
Latest stats using Puget workstation:
$ bazel build --profile profile.bin //...
...........
INFO: Writing profile data to '/home/liang/dev/drake-distro-1/profile.bin'
WARNING: /home/liang/dev/drake-distro-1/drake/util/BUILD:63:1: target '//drake/util:app_util' is deprecated: Please use gflags instead of drakeAppUtil.h.
WARNING: /home/liang/.cache/bazel/_bazel_liang/ede03c0a430a52111efe35db021d2956/external/drake_visualizer/BUILD:2:48: soft_failure.bzl: @drake_visualizer//:drake-visualizer does not work because /home/liang/dev/drake-distro-1/build/install/bin/drake-visualizer was missing.
INFO: Found 2337 targets...
INFO: From Executing genrule //drake/automotive:speed_bump_genrule:
[2017-03-31 09:57:50.770] [console] [info] Loading road geometry.
[2017-03-31 09:57:50.772] [console] [info] Generating OBJ.
INFO: Elapsed time: 301.216s, Critical Path: 271.22s
Leader board:
$ grep 'ACTION_EXECUTE.*Compiling' profile.csv |
> sort -t\| -k5,5rn | head -n40 | cut -d\| -f5,8 |
> perl -pe 's/^(\d+)\d{9}\|/\1s /;' |
> awk '{if ((NR-1) % 2 ==0) print}'
162s Compiling drake/multibody/rigid_body_tree.cc
71s Compiling drake/multibody/parsers/sdf_parser.cc
68s Compiling drake/multibody/parsers/urdf_parser.cc
68s Compiling drake/multibody/parsers/urdf_parser.cc
67s Compiling drake/multibody/parsers/sdf_parser.cc
64s Compiling drake/multibody/parsers/parser_common.cc
62s Compiling drake/multibody/parsers/parser_common.cc
57s Compiling drake/examples/Quadrotor/quadrotor_plant.cc
52s Compiling drake/math/discrete_algebraic_riccati_equation.cc
47s Compiling drake/solvers/moby_lcp_solver.cc
47s Compiling drake/solvers/moby_lcp_solver.cc
44s Compiling drake/examples/Acrobot/acrobot_run_lqr_w_estimator.cc
42s Compiling drake/examples/Acrobot/acrobot_plant.cc
40s Compiling drake/examples/QPInverseDynamicsForHumanoids/system/manipulator_inverse_dynamics_controller.cc
39s Compiling drake/automotive/single_lane_ego_and_agent.cc
38s Compiling drake/examples/QPInverseDynamicsForHumanoids/system/test/humanoid_plan_eval_system_test.cc
37s Compiling drake/examples/QPInverseDynamicsForHumanoids/system/test/qp_controller_system_test.cc
37s Compiling drake/examples/Valkyrie/test/robot_state_encoder_decoder_test.cc
37s Compiling drake/systems/framework/test/diagram_builder_test.cc
36s Compiling drake/examples/kuka_iiwa_arm/iiwa_world/iiwa_wsg_diagram_factory.cc
FYI my WIP branch on fixing this is https://github.com/jwnimmer-tri/drake/tree/rbt-build-time. I haven't had a chance to correctly reprofile and tune up the results, but it's a solution framework.
Hopefully with #8442 and #8543 merged, this is now "good enough".
Some of drake's source files take an excessive amount of time and memory to compile.
RigidBodyTree{,SDF,URDF}.cpp
are three particular offenders, requiring:Since these files are closely related, and therefore have a tendency to get built at the same time by parallel builds, on systems with only 16 GiB of RAM (assuming some running background applications, such as an IDE and web browser), it's entirely possible for this trio to consume all available memory. In particular, my (current) main machine becomes effectively unusable for tens of minutes when this trio of files hits the compile queue due to thrashing.
The first source file can be split into pieces, which helps, but the latter two are much less amenable to this process. From some experimentation, the problem seems to be due to the various joint classes (commenting out all code except
#include "joints/DrakeJoints.h"
still takes 40 sec, 2.1 GiB).The above numbers were produced using
/usr/bin/time --format=%U,%M
to measure hand-compiling the aforementioned files with-g -O2
(roughly equivalent toCMAKE_BUILD_TYPE=RelWithDebInfo
). Even with no debug/optimization flags, time and memory use is about half the above numbers, which is still fairly high.