Open roehling opened 8 years ago
Do you know what build stage it fails on?
As far as I can tell, once in the :make stage of a package, and the other times in the :install stage. At least, those are the last outputs in the log file before the abort.
We are experiencing similar issues, also within Jenkins Jobs on both Ubuntu Trusty and Xenial (each using catkin 0.4.4). It rather seems to affect long running jobs with a lot of output (several MBs).
Every time it happens the python process (catkin) is blocked in a read system call reading either on FD 6 or 7 (both pipes; other threads are waiting for a lock (probably the python GIL)). At the same time all child processes are blocked while writing to their FD 1 or 2.
Quite often (but not all the time) one of the children has already ended (zombie state).
Interestingly the whole build can always be continued with manually writing to the pipes of the catkin process, e.g. with echo > /proc/<CATKIN_PID>/fd/6
(a single write to the FD currently being read from is enough). After that everything continues like nothing happened.
Any idea how to solve / work around / debug further?
@roehling , did you solve it for you?
Unfortunately, no. The closest thing I found is a warning in the documentation of subprocess.Popen.wait that describes similar symptoms. AFAICT it does not apply here, but maybe I overlooked something.
@roehling , thanks for the update. We've found a working theory and a dirty workaround that works robustly for us:
The catkin internal make job-server runs out of job-tokens (we don't know why or how; FD 6 and 7 (see above) are relevant, because --jobserver-fds=6,7
is passed on to the child make
processes). Since catkin is also using these tokens internally it gets locked indefinitely while trying to get one out of zero available). The child processes get blocked on output because catkins also stops reading from the pipes to the children and their buffer run full at some point.
A script runs periodically that:
$catkin_pid
) having the issue it performs: sudo -u jenkins bash -c "echo -n +++ > /proc/$catkin_pid/fd/7"
to inject three fresh job-tokens (catkin seems to use '+' only; one would typically lead to it getting stuck again later).Given this theory I doubt that this is actually a catkin bug. Interestingly this only happens on (all; Trusty and Xenial) our Jenkins slaves running in Docker containers and only for very big jobs (running > 10min, producing > 10MB verbose output)
We also get stuck with catkin build. In our case running tests so I don't know if it's entirely the same issue. But it's between tests.
Here's the command we use:
/usr/bin/python /usr/local/bin/catkin build --catkin-make-args run_tests -- -j1 --force-color --no-notify --no-status -cs -i
And here's the stacktrace of catkin:
Stack for thread 140421802596096
File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "<string>", line 167, in run
File "/usr/lib/python2.7/code.py", line 243, in interact
more = self.push(line)
File "/usr/lib/python2.7/code.py", line 265, in push
more = self.runsource(source, self.filename)
File "/usr/lib/python2.7/code.py", line 87, in runsource
self.runcode(code)
File "/usr/lib/python2.7/code.py", line 103, in runcode
exec code in self.locals
File "<console>", line 3, in <module>
Stack for thread 140421785810688
File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/dist-packages/catkin_tools/execution/controllers.py", line 539, in run
event = self.event_queue.get(True)
File "/usr/lib/python2.7/Queue.py", line 168, in get
self.not_empty.wait()
File "/usr/lib/python2.7/threading.py", line 340, in wait
waiter.acquire()
Stack for thread 140421860980480
File "/usr/lib/python2.7/threading.py", line 1098, in _exitfunc
t.join()
File "/usr/lib/python2.7/threading.py", line 940, in join
self.__block.wait()
File "/usr/lib/python2.7/threading.py", line 340, in wait
waiter.acquire()
I've been trying to implement the workaround. However trying to do echo > /proc/<CATKIN_PID>/fd/6
or echo -n +++ > /proc/<CATKIN_PID>/fd/6
doesn't seem to work. It tells me that the device doesn't exist. Hower ls /proc/
I added some warnings (making the output more verbose), which consistently triggers this behavior. Just wanted to confirm that it's probaly related to the amount of output.
The workaround still doesn't get this unstuck, however now it didn't tell me the device didn't exist, and I checked, I can see writing to stdout and stderr (/proc/$(pgrep -x catkin)/fd/1 and /proc/$(pgrep -x catkin)/fd/2) on the Jenkins log, so it's really catkin that's stuck, not anything after.
Debugging into catkin I find whenever you stop the debugger it has the same callstack:
MainThread:
_decode / io.py
on_stderr_received / io.py
run_until_complete / executor.py
build_isolated_workspace / build.py
main / cli.py
catkin_main / catkin.py
main / catkin.py
<module> / catkin
When it hangs there are about ~4000 chars in stderr always (never ending, but not constant).
When it doesn't hang there are up to about ~2500 chars arriving, normally ~200 chars.
I'm not sure if there's something happening where the stderr is read but not flushed properly.
I've hit a similar problem, but somehow mine is different. I find that I would get a package stuck every time with message generation in geneus. My solution was to just disable building the euslisp message code.
export ROS_LANG_DISABLE=geneus:genlisp
@abrandemuehl as a geneus
maiintainer, sorry for your trouble. Could you tell me how to reproduce your problem? How did you build workspace and which commands did you executed.
@abrandemuehl @k-okada May I suggest debugging catkin_tools and checking how much output is read from stderr?
Maybe it's the same problem if this message generation is very wordy.
I'm using Ubuntu 18.04 and ROS melodic, catkin tools 0.6.1.
I've made a clean workspace with just the catkin_simple and voxblox with the following directory layout.
adrian@ubuntu:~/thesis/test_ws$ tree -d -L 3
.
└── src
├── catkin_simple
│ ├── cmake
│ └── test
└── voxblox
├── docs
├── voxblox
├── voxblox_msgs
├── voxblox_ros
└── voxblox_rviz_plugin
10 directories
adrian@ubuntu:~/thesis/test_ws$ catkin build voxblox_msgs --verbose
----------------------------------------------------------------
Profile: default
Extending: [explicit] /opt/ros/melodic
Workspace: /home/adrian/thesis/test_ws
----------------------------------------------------------------
Build Space: [exists] /home/adrian/thesis/test_ws/build
Devel Space: [exists] /home/adrian/thesis/test_ws/devel
Install Space: [unused] /home/adrian/thesis/test_ws/install
Log Space: [missing] /home/adrian/thesis/test_ws/logs
Source Space: [exists] /home/adrian/thesis/test_ws/src
DESTDIR: [unused] None
----------------------------------------------------------------
Devel Space Layout: linked
Install Space Layout: None
----------------------------------------------------------------
Additional CMake Args: None
Additional Make Args: None
Additional catkin Make Args: None
Internal Make Job Server: True
Cache Job Environments: False
----------------------------------------------------------------
Whitelisted Packages: None
Blacklisted Packages: None
----------------------------------------------------------------
Workspace configuration appears valid.
NOTE: Forcing CMake to run for each package.
----------------------------------------------------------------
[build] Found '5' packages in 0.0 seconds.
[build] Updating package table.
Starting >>> catkin_tools_prebuild
Starting >> catkin_tools_prebuild:loadenv
Output << catkin_tools_prebuild:loadenv /home/adrian/thesis/test_ws/logs/catkin_tools_prebuild/build.loadenv.000.log
Loading environment from: /home/adrian/thesis/test_ws/devel/env.sh
Finished << catkin_tools_prebuild:loadenv
Starting >> catkin_tools_prebuild:mkdir
Starting >> catkin_tools_prebuild:mkdir
Starting >> catkin_tools_prebuild:cache-manifest
Starting >> catkin_tools_prebuild:ctr-nuke
Starting >> catkin_tools_prebuild:cmake
Subprocess > catkin_tools_prebuild:cmake `cd /home/adrian/thesis/test_ws/build/catkin_tools_prebuild; catkin build --get-env catkin_tools_prebuild | catkin env -si /usr/bin/cmake /home/adrian/thesis/test_ws/build/catkin_tools_prebuild --no-warn-unused-cli -DCATKIN_DEVEL_PREFIX=/home/adrian/thesis/test_ws/devel/.private/catkin_tools_prebuild -DCMAKE_INSTALL_PREFIX=/home/adrian/thesis/test_ws/install; cd -`
Output << catkin_tools_prebuild:cmake /home/adrian/thesis/test_ws/logs/catkin_tools_prebuild/build.cmake.000.log
Not searching for unused variables given on the command line.
Re-run cmake no build system arguments
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/lib/ccache/cc
-- Check for working C compiler: /usr/lib/ccache/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/lib/ccache/c++
-- Check for working CXX compiler: /usr/lib/ccache/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using CATKIN_DEVEL_PREFIX: /home/adrian/thesis/test_ws/devel/.private/catkin_tools_prebuild
-- Using CMAKE_PREFIX_PATH: /opt/ros/melodic
-- This workspace overlays: /opt/ros/melodic
-- Found PythonInterp: /usr/bin/python2 (found suitable version "2.7.17", minimum required is "2")
-- Using PYTHON_EXECUTABLE: /usr/bin/python2
-- Using Debian Python package layout
-- Using empy: /usr/bin/empy
-- Using CATKIN_ENABLE_TESTING: ON
-- Call enable_testing()
-- Using CATKIN_TEST_RESULTS_DIR: /home/adrian/thesis/test_ws/build/catkin_tools_prebuild/test_results
-- Found gtest sources under '/usr/src/googletest': gtests will be built
-- Found gmock sources under '/usr/src/googletest': gmock will be built
-- Found PythonInterp: /usr/bin/python2 (found version "2.7.17")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Using Python nosetests: /usr/bin/nosetests-2.7
-- catkin 0.7.29
-- BUILD_SHARED_LIBS is on
-- Configuring done
-- Generating done
-- Build files have been written to: /home/adrian/thesis/test_ws/build/catkin_tools_prebuild
cd /home/adrian/thesis/test_ws/build/catkin_tools_prebuild; catkin build --get-env catkin_tools_prebuild | catkin env -si /usr/bin/cmake /home/adrian/thesis/test_ws/build/catkin_tools_prebuild --no-warn-unused-cli -DCATKIN_DEVEL_PREFIX=/home/adrian/thesis/test_ws/devel/.private/catkin_tools_prebuild -DCMAKE_INSTALL_PREFIX=/home/adrian/thesis/test_ws/install; cd -
Finished << catkin_tools_prebuild:cmake
Starting >> catkin_tools_prebuild:make
Subprocess > catkin_tools_prebuild:make `cd /home/adrian/thesis/test_ws/build/catkin_tools_prebuild; catkin build --get-env catkin_tools_prebuild | catkin env -si /usr/bin/make --jobserver-fds=6,7 -j; cd -`
Output << catkin_tools_prebuild:make /home/adrian/thesis/test_ws/logs/catkin_tools_prebuild/build.make.000.log
/usr/bin/cmake -H/home/adrian/thesis/test_ws/build/catkin_tools_prebuild -B/home/adrian/thesis/test_ws/build/catkin_tools_prebuild --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/adrian/thesis/test_ws/build/catkin_tools_prebuild/CMakeFiles /home/adrian/thesis/test_ws/build/catkin_tools_prebuild/CMakeFiles/progress.marks
/usr/bin/make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/home/adrian/thesis/test_ws/build/catkin_tools_prebuild'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/adrian/thesis/test_ws/build/catkin_tools_prebuild'
/usr/bin/cmake -E cmake_progress_start /home/adrian/thesis/test_ws/build/catkin_tools_prebuild/CMakeFiles 0
cd /home/adrian/thesis/test_ws/build/catkin_tools_prebuild; catkin build --get-env catkin_tools_prebuild | catkin env -si /usr/bin/make --jobserver-fds=6,7 -j; cd -
Finished << catkin_tools_prebuild:make
Starting >> catkin_tools_prebuild:symlink
Output << catkin_tools_prebuild:symlink /home/adrian/thesis/test_ws/logs/catkin_tools_prebuild/build.symlink.000.log
Symlinking /home/adrian/thesis/test_ws/devel/./cmake.lock
Symlinking /home/adrian/thesis/test_ws/devel/./local_setup.zsh
Symlinking /home/adrian/thesis/test_ws/devel/./local_setup.sh
Symlinking /home/adrian/thesis/test_ws/devel/./_setup_util.py
Symlinking /home/adrian/thesis/test_ws/devel/./setup.zsh
Symlinking /home/adrian/thesis/test_ws/devel/./setup.bash
Symlinking /home/adrian/thesis/test_ws/devel/./env.sh
Symlinking /home/adrian/thesis/test_ws/devel/./setup.sh
Symlinking /home/adrian/thesis/test_ws/devel/./local_setup.bash
Symlinking /home/adrian/thesis/test_ws/devel/share/catkin_tools_prebuild/cmake/catkin_tools_prebuildConfig.cmake
Symlinking /home/adrian/thesis/test_ws/devel/share/catkin_tools_prebuild/cmake/catkin_tools_prebuildConfig-version.cmake
Symlinking /home/adrian/thesis/test_ws/devel/lib/pkgconfig/catkin_tools_prebuild.pc
Finished << catkin_tools_prebuild:symlink
Finished <<< catkin_tools_prebuild [ 2.3 seconds ]
Starting >>> catkin_simple
Starting >> catkin_simple:loadenv
Output << catkin_simple:loadenv /home/adrian/thesis/test_ws/logs/catkin_simple/build.loadenv.000.log
Loading environment from: /home/adrian/thesis/test_ws/devel/env.sh
Finished << catkin_simple:loadenv
Starting >> catkin_simple:mkdir
Starting >> catkin_simple:mkdir
Starting >> catkin_simple:cache-manifest
Starting >> catkin_simple:ctr-nuke
Starting >> catkin_simple:cmake
Subprocess > catkin_simple:cmake `cd /home/adrian/thesis/test_ws/build/catkin_simple; catkin build --get-env catkin_simple | catkin env -si /usr/bin/cmake /home/adrian/thesis/test_ws/src/catkin_simple --no-warn-unused-cli -DCATKIN_DEVEL_PREFIX=/home/adrian/thesis/test_ws/devel/.private/catkin_simple -DCMAKE_INSTALL_PREFIX=/home/adrian/thesis/test_ws/install; cd -`
Output << catkin_simple:cmake /home/adrian/thesis/test_ws/logs/catkin_simple/build.cmake.000.log
Not searching for unused variables given on the command line.
Re-run cmake no build system arguments
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/lib/ccache/cc
-- Check for working C compiler: /usr/lib/ccache/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/lib/ccache/c++
-- Check for working CXX compiler: /usr/lib/ccache/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using CATKIN_DEVEL_PREFIX: /home/adrian/thesis/test_ws/devel/.private/catkin_simple
-- Using CMAKE_PREFIX_PATH: /home/adrian/thesis/test_ws/devel;/opt/ros/melodic
-- This workspace overlays: /home/adrian/thesis/test_ws/devel;/opt/ros/melodic
-- Found PythonInterp: /usr/bin/python2 (found suitable version "2.7.17", minimum required is "2")
-- Using PYTHON_EXECUTABLE: /usr/bin/python2
-- Using Debian Python package layout
-- Using empy: /usr/bin/empy
-- Using CATKIN_ENABLE_TESTING: ON
-- Call enable_testing()
-- Using CATKIN_TEST_RESULTS_DIR: /home/adrian/thesis/test_ws/build/catkin_simple/test_results
-- Found gtest sources under '/usr/src/googletest': gtests will be built
-- Found gmock sources under '/usr/src/googletest': gmock will be built
-- Found PythonInterp: /usr/bin/python2 (found version "2.7.17")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Using Python nosetests: /usr/bin/nosetests-2.7
-- catkin 0.7.29
-- BUILD_SHARED_LIBS is on
-- Configuring done
-- Generating done
-- Build files have been written to: /home/adrian/thesis/test_ws/build/catkin_simple
cd /home/adrian/thesis/test_ws/build/catkin_simple; catkin build --get-env catkin_simple | catkin env -si /usr/bin/cmake /home/adrian/thesis/test_ws/src/catkin_simple --no-warn-unused-cli -DCATKIN_DEVEL_PREFIX=/home/adrian/thesis/test_ws/devel/.private/catkin_simple -DCMAKE_INSTALL_PREFIX=/home/adrian/thesis/test_ws/install; cd -
Finished << catkin_simple:cmake
Starting >> catkin_simple:make
Subprocess > catkin_simple:make `cd /home/adrian/thesis/test_ws/build/catkin_simple; catkin build --get-env catkin_simple | catkin env -si /usr/bin/make --jobserver-fds=6,7 -j; cd -`
Output << catkin_simple:make /home/adrian/thesis/test_ws/logs/catkin_simple/build.make.000.log
/usr/bin/cmake -H/home/adrian/thesis/test_ws/src/catkin_simple -B/home/adrian/thesis/test_ws/build/catkin_simple --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/adrian/thesis/test_ws/build/catkin_simple/CMakeFiles /home/adrian/thesis/test_ws/build/catkin_simple/CMakeFiles/progress.marks
/usr/bin/make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/home/adrian/thesis/test_ws/build/catkin_simple'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/adrian/thesis/test_ws/build/catkin_simple'
/usr/bin/cmake -E cmake_progress_start /home/adrian/thesis/test_ws/build/catkin_simple/CMakeFiles 0
cd /home/adrian/thesis/test_ws/build/catkin_simple; catkin build --get-env catkin_simple | catkin env -si /usr/bin/make --jobserver-fds=6,7 -j; cd -
Finished << catkin_simple:make
Starting >> catkin_simple:symlink
Output << catkin_simple:symlink /home/adrian/thesis/test_ws/logs/catkin_simple/build.symlink.000.log
Symlinking /home/adrian/thesis/test_ws/devel/share/catkin_simple/cmake/catkin_simpleConfig.cmake
Symlinking /home/adrian/thesis/test_ws/devel/share/catkin_simple/cmake/catkin_simpleConfig-version.cmake
Symlinking /home/adrian/thesis/test_ws/devel/share/catkin_simple/cmake/catkin_simple-extras.cmake
Symlinking /home/adrian/thesis/test_ws/devel/lib/pkgconfig/catkin_simple.pc
Finished << catkin_simple:symlink
Finished <<< catkin_simple [ 2.5 seconds ]
Starting >>> voxblox_msgs
Starting >> voxblox_msgs:loadenv
Output << voxblox_msgs:loadenv /home/adrian/thesis/test_ws/logs/voxblox_msgs/build.loadenv.000.log
Loading environment from: /home/adrian/thesis/test_ws/devel/env.sh
Finished << voxblox_msgs:loadenv
Starting >> voxblox_msgs:mkdir
Starting >> voxblox_msgs:mkdir
Starting >> voxblox_msgs:cache-manifest
Starting >> voxblox_msgs:ctr-nuke
Starting >> voxblox_msgs:cmake
Subprocess > voxblox_msgs:cmake `cd /home/adrian/thesis/test_ws/build/voxblox_msgs; catkin build --get-env voxblox_msgs | catkin env -si /usr/bin/cmake /home/adrian/thesis/test_ws/src/voxblox/voxblox_msgs --no-warn-unused-cli -DCATKIN_DEVEL_PREFIX=/home/adrian/thesis/test_ws/devel/.private/voxblox_msgs -DCMAKE_INSTALL_PREFIX=/home/adrian/thesis/test_ws/install; cd -`
Output << voxblox_msgs:cmake /home/adrian/thesis/test_ws/logs/voxblox_msgs/build.cmake.000.log
Not searching for unused variables given on the command line.
Re-run cmake no build system arguments
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/lib/ccache/cc
-- Check for working C compiler: /usr/lib/ccache/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/lib/ccache/c++
-- Check for working CXX compiler: /usr/lib/ccache/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using CATKIN_DEVEL_PREFIX: /home/adrian/thesis/test_ws/devel/.private/voxblox_msgs
-- Using CMAKE_PREFIX_PATH: /home/adrian/thesis/test_ws/devel;/opt/ros/melodic
-- This workspace overlays: /home/adrian/thesis/test_ws/devel;/opt/ros/melodic
-- Found PythonInterp: /usr/bin/python2 (found suitable version "2.7.17", minimum required is "2")
-- Using PYTHON_EXECUTABLE: /usr/bin/python2
-- Using Debian Python package layout
-- Using empy: /usr/bin/empy
-- Using CATKIN_ENABLE_TESTING: ON
-- Call enable_testing()
-- Using CATKIN_TEST_RESULTS_DIR: /home/adrian/thesis/test_ws/build/voxblox_msgs/test_results
-- Found gtest sources under '/usr/src/googletest': gtests will be built
-- Found gmock sources under '/usr/src/googletest': gmock will be built
-- Found PythonInterp: /usr/bin/python2 (found version "2.7.17")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Using Python nosetests: /usr/bin/nosetests-2.7
-- catkin 0.7.29
-- BUILD_SHARED_LIBS is on
-- Using these message generators: gencpp;geneus;genlisp;gennodejs;genpy
-- voxblox_msgs: 5 messages, 1 services
-- Configuring done
-- Generating done
-- Build files have been written to: /home/adrian/thesis/test_ws/build/voxblox_msgs
cd /home/adrian/thesis/test_ws/build/voxblox_msgs; catkin build --get-env voxblox_msgs | catkin env -si /usr/bin/cmake /home/adrian/thesis/test_ws/src/voxblox/voxblox_msgs --no-warn-unused-cli -DCATKIN_DEVEL_PREFIX=/home/adrian/thesis/test_ws/devel/.private/voxblox_msgs -DCMAKE_INSTALL_PREFIX=/home/adrian/thesis/test_ws/install; cd -
Finished << voxblox_msgs:cmake
Starting >> voxblox_msgs:make
Subprocess > voxblox_msgs:make `cd /home/adrian/thesis/test_ws/build/voxblox_msgs; catkin build --get-env voxblox_msgs | catkin env -si /usr/bin/make --jobserver-fds=6,7 -j; cd -`
To reproduce it, I'm just running catkin build
and it'll get stuck on the voxblox_msgs package. Usually to see which step it's actually getting stuck on, I'll go into the workspace build/
directory and run make
directly, which also reproduces it.
I'll try the debugging method now and see what happens.
Stepping with pdb through /opt/ros/melodic/share/geneus/cmake/../../../lib/geneus/gen_eus.py
I find that I get stuck on get_pkg_map()
. Here's the pdb
backtrace:
(Pdb) bt
/usr/lib/python2.7/pdb.py(1314)main()
-> pdb._runscript(mainpyfile)
/usr/lib/python2.7/pdb.py(1233)_runscript()
-> self.run(statement)
/usr/lib/python2.7/bdb.py(400)run()
-> exec cmd in globals, locals
<string>(1)<module>()
/opt/ros/melodic/lib/geneus/gen_eus.py(39)<module>()
-> import geneus
/opt/ros/melodic/lib/python2.7/dist-packages/geneus/geneus_main.py(137)genmain()
-> pkg_map = get_pkg_map()
/opt/ros/melodic/lib/python2.7/dist-packages/geneus/geneus_main.py(56)get_pkg_map()
-> pkgs = packages.find_packages(ws)
/usr/local/lib/python2.7/dist-packages/catkin_pkg/packages.py(89)find_packages()
-> packages = find_packages_allowing_duplicates(basepath, exclude_paths=exclude_paths, exclude_subspaces=exclude_subspaces, warnings=warnings)
/usr/local/lib/python2.7/dist-packages/catkin_pkg/packages.py(160)find_packages_allowing_duplicates()
-> pool.join()
/usr/lib/python2.7/multiprocessing/pool.py(479)join()
-> p.join()
/usr/lib/python2.7/multiprocessing/process.py(148)join()
-> res = self._popen.wait(timeout)
/usr/lib/python2.7/multiprocessing/forking.py(154)wait()
-> return self.poll(0)
> /usr/lib/python2.7/multiprocessing/forking.py(135)poll()
-> pid, sts = os.waitpid(self.pid, flag)
Which shows that it has to do with the multiprocessing in find_packages_allowing_duplicates
in catkin_pkg/packages.py
. Specifically there is some code that uses multiprocessing to go through a lot of packages
I think what's specific to my system is that some gazebo packages fail in the _PackageParser
in catkin_pkg/packages.py
I've set parallel
to False
here: https://github.com/ros-infrastructure/catkin_pkg/blob/master/src/catkin_pkg/packages.py#L134 and got the following error. Honestly it looks like my issue would be better in ros-infrastructure/catkin_pkg
.
InvalidPackage: Error(s) in package '/opt/ros/melodic/share/ros_ign_bridge/package.xml':
Error(s):
- The generic dependency on 'ignition-msgs6' is redundant with: build_depend, build_export_depend, exec_depend
- The generic dependency on 'ignition-transport9' is redundant with: build_depend, build_export_depend, exec_depend
ERROR: Error(s) in package '/opt/ros/melodic/share/ros_ign_bridge/package.xml':
Error(s):
- The generic dependency on 'ignition-msgs6' is redundant with: build_depend, build_export_depend, exec_depend
- The generic dependency on 'ignition-transport9' is redundant with: build_depend, build_export_depend, exec_depend
I've also seen around the internet that there have been some problems in the past with raising user exceptions in multiprocessing pools. See https://stackoverflow.com/questions/2246384/multiprocessing-pool-hangs-when-there-is-a-exception-in-any-of-the-thread and https://bugs.python.org/issue13751
The error in the package is from the following lines conditional depend lines. Removing the conditional lines in the following files locally on my system fixes the problem. I'm guessing that the version of catkin I'm using (0.6.1) doesn't support these properly somehow.
https://github.com/ignitionrobotics/ros_ign/blob/melodic/ros_ign_image/package.xml#L11 https://github.com/ignitionrobotics/ros_ign/blob/melodic/ros_ign_gazebo/package.xml#L13 https://github.com/ignitionrobotics/ros_ign/blob/melodic/ros_ign_bridge/package.xml#L22
EDIT:
From this solution I'm guessing this is actually a problem in catkin_pkg
which is being inherited by the geneus
code by importing and using it.
EDIT 2:
I think what's really happening is that the function used by geneus
(parse_package_string
) doesn't support using condition
tags in the dependencies.
My issue has been solved. Basically I had an old version of catkin-pkg installed via pip directly. Uninstalling that was the fix for my issues.
@SoftwareApe it looks like we are having the same problem: a couple of cmake processes stuck in write() trying to output compilation progress messages. Do you use multithreaded builds?
Perhaps this is the root issue https://savannah.gnu.org/bugs/index.php?51159
System Info
Build / Run Issue
I apologize in advance for this very imprecise bug report.
I noticed that, starting with the new version 0.4.x,
catkin build
will hang sometimes. It happens while processing a random package and very rarely (I'd say 3 out of 100 builds on our Jenkins). Also, it will run just fine if I cancel and reschedule the build.I stumbled upon the
--no-install-lock
option, which I just added to our build script in the hope that it will resolve this issue. I won't be able to tell until sufficiently many builds have run, obviously.In case anyone has an idea where to look for this problem, our build script runs the following commands:
The last command outputs: