Open davidak opened 1 year ago
I had the same issue again with https://github.com/NixOS/nixpkgs/pull/199813. 2 processes building a package each of
python310Packages.cvxpy python311Packages.cvxpy python310Packages.qutip python311Packages.qutip
When i killed one, the rest are finished in minutes. Before it was stuck for 16 hours!
I had the issue again with https://github.com/NixOS/nixpkgs/pull/216399. 2 package builds where blocked. They did only sched_yield
calls.
python310Packages.opensfm python311Packages.opensfm
Build not finished after 3.5 hours! Killing one build did not help to unblock the other.
The issue is also reproducible when building one package alone!
nix-build . -A python310Packages.opensfm
The pytests get stuck at 75%:
...
opensfm/test/test_dense.py .. [ 55%]
opensfm/test/test_geo.py ..... [ 57%]
opensfm/test/test_geometry.py .... [ 59%]
opensfm/test/test_io.py ......... [ 62%]
opensfm/test/test_matching.py ...... [ 65%]
opensfm/test/test_multiview.py ......... [ 68%]
opensfm/test/test_pairs_selection.py ......... [ 72%]
opensfm/test/test_reconstruction_alignment.py ........ [ 75%]
opensfm/test/test_reconstruction_incremental.py Terminated
When i look into it with strace, it has 5 threads. Maybe they create the deadlock.
[root@gaming:~]# strace -f -c -p 2468163
strace: Process 2468163 attached with 5 threads
^Cstrace: Process 2468163 detached
strace: Process 2468290 detached
strace: Process 2468291 detached
strace: Process 2468292 detached
strace: Process 2475812 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 20.227928 7 2786017 sched_yield
------ ----------- ----------- --------- --------- ----------------
100.00 20.227928 7 2786017 total
When i started the build again, it worked successfully.
I also learned that deadlocks are a common issue in Python. So maybe we have to report them to the affected packages.
https://pythonspeed.com/articles/python-multiprocessing/ https://rachelbythebay.com/w/2011/06/07/forked/ https://github.com/python/cpython/issues/50970 https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods https://github.com/orgs/python/projects/12/views/1
A futex can only be used to communicate with child processes until they exec
, and Nix always exec
s the builder, so Nix can not be involved in this deadlock.
I'll move this issue to nixpkgs.
@roberth Nix could still handle such situations better, e.g. detecting deadlocks and cancel builds after a timeout.
It can already cancel a build after log lines stop appearing, with max-silent-time.
I don't think detecting the deadlock itself is feasible without significant overhead.
Describe the bug
I used
nixpkgs-review
to build a PR that affect many packages (https://github.com/NixOS/nixpkgs/pull/216403).After 15 hours, the CPU was still at 100% load, but the temperature low, as if it is not doing anything.
There are 4 python package builds that seem to be stuck.
They are running
pytest
.but the build is stuck at the same test for hours...
2 processes only do
futex
calls and the other 2sched_yield
Maybe this is a pytest issue, but Nix should also do something to unblock such situations!
Workaround
To unblock the situation and let Nix build the rest of the packages, you have to kill the deadlocked python test processes.
The "Terminated" appears.
Also the strange
stdenv-linux/setup
error.It's interesting that even when i killed 3 of 4, the last one is not unblocked. So they seem not to get blocked on each other. But the issue also don't appear when building the package alone!
Steps To Reproduce
nixpkgs-review pr 215689
(Builds 138 packages which need 17 GB!)Not reproducible with e.g.:
Expected behavior
Nix should be able to build 100+ packages without errors.
nix-env --version
output nix-env (Nix) 2.11.1Additional context
Packages that get stuck in this way:
python3.11-OpenSfM-unstable-2022-03-10 python3.10-OpenSfM-unstable-2022-03-10 python3.10-qutip-4.7.1 python3.11-cvxpy-1.2.3 python3.10-cvxpy-1.2.3
Priorities
Add :+1: to issues you find important.