gittup / tup

Tup is a file-based build system.
http://gittup.org/tup/
GNU General Public License v2.0
1.17k stars 144 forks source link

Hangs on macOS Monterey with Apple M1 Max processor #466

Closed petemoore closed 1 year ago

petemoore commented 2 years ago

Running tup in spectrum4 project causes a hang, which I don't see on other platforms. I'm not sure how best to troubleshoot the issue.

The output in tup before the hang shows:

* 1024) src/spectrum4/tests: ./test_po_any.sh                                                                                                    
./test_po_any.sh: line 411: echo: write error: Interrupted system call

This is this rather innocuous echo line.

If I then Ctrl-C, I get the message tup: signal caught - waiting for jobs to finish. but it never exits. Further Ctrl-C's do not help, so I usually resort to killing the terminal. However, after doing this, tup will refuse to run, because tup reports Waiting for another tup process (or an autoupdate) to finish....

After this I try to clean up as follows:

pmoore@Peters-MacBook-Pro:~/git/spectrum4 main $ git clean -fdx
warning: could not open directory '.tup/mnt/': Operation not permitted
warning: could not lstat .tup/mnt
: Operation not permitted

And from this point, if I try to run tup it will output nothing, but not return, it just seems to hang. So usually at this point I reboot, and probably check out the repo in a new directory for good measure.

My tup version is:

pmoore@Peters-MacBook-Pro:~/git/spectrum4 main $ tup --version
tup v0.7.11-86-g94b47c5f

I built from source on localhost.

I can provide any further diagnostic information required.

The issue may of course be related to other toolchains, rather than tup itself.

I suspect I probably haven't provided enough information to diagnose the issue, but am mostly keen to understand how I can troubleshoot to understand what has hung, and why the interrupted system calls are occurring.

Note, the point of failure seems to vary, e.g. if I make a new checkout and run again now, I get

* 1101) src/spectrum4/tests: ./test_po_t_udg.sh                                                                                                   
./test_po_t_udg.sh: line 205: echo: write error: Interrupted system call
 *** tup messages ***
 *** Command ID=350 failed with return value 1

So same root cause (interrupted system call from an echo) - but a totally different echo in a different file in a different tup build node.

Thanks for any guidance anyone might be able to provide! :-)

petemoore commented 2 years ago

This is how it looks when it hangs:

Screenshot 2022-07-25 at 21 40 57
petemoore commented 2 years ago

Perhaps this is this bash bug. Some context here.

Looks like macOS Monterey ships with a rather old version of bash:

$ bash --version
GNU bash, version 3.2.57(1)-release (arm64-apple-darwin21)
Copyright (C) 2007 Free Software Foundation, Inc.

I never saw this on my previous x86_64 macs, that were presumably running a similar or even earlier version of bash, but maybe there is some subtle difference in my current environment that triggers it. I'll see if I can either upgrade my bash version or switch to e.g. zsh, and see if that fixes it. It does seem surprising that this didn't surface before, so seems to be either specific to macOS Monterey or running under aarch64 in particular.

I'll leave this open, because I'm not sure the current behaviour is desirable, that tup hangs, and can't be terminated with a Ctrl-C, and that if forcefully killed, that directory .tup/mnt/ can't be opened etc (as per opening comment in this issue). Or perhaps these symptoms are due to OS restrictions that tup has no influence over? Note, I even needed to reboot in order to be able to blow away the files under .tup directory - but perhaps I could have done this by killing some other processes, and a full reboot wasn't necessary. To make things work, the reboot actually hangs, so I have to forcefully reboot the machine holding the power button!

petemoore commented 2 years ago

Note, working fine under bash 5.1.16.

I'll leave this open, because I'm not sure the current behaviour is desirable, that tup hangs, and can't be terminated with a Ctrl-C, and that if forcefully killed, that directory .tup/mnt/ can't be opened etc (as per opening comment in this issue). Or perhaps these symptoms are due to OS restrictions that tup has no influence over?

I'll leave this issue open because of this, but feel free to close if you think this is ok, and tup shouldn't forcefully terminate hung processes.

petemoore commented 2 years ago

Note, even with bash fixed, if I Ctrl-C while tup is running, on macOS Monterey with an M1 Max processor, it will hang forever, and I need to reboot my system. Happy to provide more diagnostics if that is helpful. Curious if anyone else has hit this issue with M1 Max? I didn't have it on my amd64 Macs on earlier versions of macOS. Previously, tup would wait for the processes to complete, and then exit, whereas now it says it is waiting for them to terminate, and either they never do, or they do but tup hangs.

Reproduction steps: build spectrum4 using tup on a Mac with macOS Monterey, and an M1 Max processor.

petemoore commented 1 year ago

Note, my problem has long since gone away, and I don't remember what the remedy was. Since this ticket was created I upgraded to Ventura, and probably updated various other toolchains and dependencies along the way. I will close here, as it looks like nobody else has experienced similar problems.

petemoore commented 8 months ago

It may be that this issue was when I was calling tup under docker. I suspect I needed to pass --init to docker run and wasn't doing so, and this probably impacted how the Ctrl-C was intercepted. But it is too far in the past to know now.

For anyone stumbling on this, I can successfully call tup under docker using

docker run --init -t --cap-add SYS_ADMIN --device /dev/fuse --security-opt apparmor:unconfined ........