RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.27k stars 1.26k forks source link

OOM errors when building python bindings #16722

Closed odellus closed 2 years ago

odellus commented 2 years ago

I'm using a MSI Laptop with an 8 core i7 and 16 Gb of RAM.

Steps to reproduce:

git clone https://github.com/robotlocomotion/drake
# Add the following to line 16 of `tools/bazel.rc` to try to prevent crashing if you're using a laptop
# build --local_ram_resources=HOST_RAM*.67 --local_cpu_resources=HOST_CPUS*.5

cd drake
sudo ./setup/ubuntu/install_prereqs.sh
mkdir drake-build
cd drake-build
cmake ..
make

Error

ERROR: /home/thomas/src/drake/bindings/pydrake/multibody/BUILD.bazel:103:21: Compiling bindings/pydrake/multibody/plant_py.cc failed: (Exit 1): cc failed: error executing command /usr/bin/cc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 372 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox
cc: fatal error: Killed signal terminated program cc1plus
compilation terminated.
Target //:install failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 910.699s, Critical Path: 287.78s
INFO: 1841 processes: 28 internal, 1807 linux-sandbox, 6 worker.
FAILED: Build did NOT complete successfully
make[2]: *** [CMakeFiles/drake_cxx_python.dir/build.make:112: drake_cxx_python-prefix/src/drake_cxx_python-stamp/drake_cxx_python-build] Error 1
make[1]: *** [CMakeFiles/Makefile2:860: CMakeFiles/drake_cxx_python.dir/all] Error 2
make: *** [Makefile:163: all] Error 2

I added build --verbose_failures --sandbox_debug to tools/bazel.rc. Running the command that fails when using --verbose_failures causes the terminal window to exit.

Looks like OOM errors from dmesg.

odellus commented 2 years ago

Changing the amount of host ram from 0.67 total amount to 0.9 of system total memory in addition to reducing the number of concurrent jobs from 8 to 4 allowed me to finally build drake w/python bindings from source.

Wanted to go ahead and open the issue because I struggled w/this and maybe putting it out there will help someone else. The problematic file causing the system to OOM was multibody_plant.py.cc.