bcpierce00 / unison

Unison file synchronizer
GNU General Public License v3.0
4.01k stars 227 forks source link

2.53.2 on macOS Ventura 13.3 (22E252) hangs while looking for changes #888

Closed schwa closed 9 months ago

schwa commented 1 year ago

I'm not sure if this was caused by the macOS 13.3 update or by Unison 2.53.2 but Unison GUI and text both hang while looking for changes.

I've been using Unison daily for years now and things have (mostly) worked fine but after this pair of updates I'm unable to sync two Mac systems.

Both processes are unquarantined on Mac, I'm able to SSH to the destination box just fine, and both apps have the full disk permissions.

Running -ui text with -debug all the process performs a ton of work and then stops at [server: update] Setting archive for //ungoliant//Users/schwa (ungoliant is the host I'm syncing against).

schwa commented 1 year ago

Deleting all ar & fp archive files from .unison directory on both sides doesn't solve the issue.

What's interesting is that when doing a sync the archive on the remove end is written (at 16.9MB) but on the local end the process hangs an there's only an archive with 34 bytes in it.

tleedjarv commented 1 year ago

Would you be able to attach a debugger to the hanging process and get a backtrace? If you have gdb, for example, then you can get the backtrace by gdb --batch -ex bt -p ##. Replace ## with process id of the hanging process.

schwa commented 1 year ago

Unfortunately not. Running x86 unison on an M1 Mac is making lldb unhappy

lldb unison -- ungoliant.prf -ui text
(lldb) target create "unison"
Current executable set to 'unison' (x86_64).
(lldb) settings set -- target.run-args  "ungoliant.prf" "-ui" "text"
(lldb) process launch
Process 68555 launched: '/opt/homebrew/bin/unison' (x86_64)
warning: libobjc.A.dylib is being read from process memory. This indicates that LLDB could not read from the host's in-memory shared cache. This will likely reduce debugging performance.

Process 68555 exited with status = 5 (0x00000005) Terminated due to signal 5
tleedjarv commented 1 year ago

Unfortunately not. Running x86 unison on an M1 Mac is making lldb unhappy

Hm, the log looks like it could work. Can you start unison normally, wait for it to hang and only then attach lldb to get the backtrace?

schwa commented 1 year ago

Yup!


lldb -p 71199
(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007ff81517b0ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff8151b7758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x0000000100247b19 Unison`_passCall + 101
    frame #3: 0x0000000100247c2e Unison`ocamlCall + 133
    frame #4: 0x000000010023ec5c Unison`main + 304
    frame #5: 0x000000010023eb24 Unison`start + 52
  thread #2
    frame #0: 0x00007ff81517b0ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff8151b7758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x00000001003535a2 Unison`caml_thread_join + 130
    frame #3: 0x000000010024eed4 Unison`camlUimacbridge__callbackThreadCreate_506 + 68
    frame #4: 0x000000010037dc85 Unison`caml_start_program + 73
    frame #5: 0x0000000100371de5 Unison`caml_callback_exn + 37
    frame #6: 0x00000001002471e7 Unison`-[Bridge _ocamlStartup:] + 129
    frame #7: 0x00007ff8160747b3 Foundation`__NSThread__start__ + 1009
    frame #8: 0x00007ff8151b71d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #9: 0x00007ff8151b2bd3 libsystem_pthread.dylib`thread_start + 15
  thread #3
    frame #0: 0x00000001003708cd Unison`caml_MD5Transform + 669
    frame #1: 0x0000000100370262 Unison`caml_MD5Update + 226
    frame #2: 0x00000001003704be Unison`caml_md5_channel + 270
    frame #3: 0x00000001003705aa Unison`caml_md5_chan + 90
    frame #4: 0x00000001002e06a4 Unison`camlSystem_generic__fingerprint_777 + 68
    frame #5: 0x00000001002da61e Unison`camlUtil__convertUnixErrorsToExn_1385 + 46
    frame #6: 0x00000001002b3876 Unison`camlOs__retryLoop_1416 + 166
    frame #7: 0x0000000100295644 Unison`camlFpcache__fingerprint_inner_1629 + 756
    frame #8: 0x000000010028c2af Unison`camlUpdate__buildUpdateRec_2465 + 623
    frame #9: 0x000000010028bbb4 Unison`camlUpdate__handleChild_2502 + 516
    frame #10: 0x000000010028bfe8 Unison`camlUpdate__fun_5485 + 24
    frame #11: 0x0000000100306680 Unison`camlStdlib__List__iter_507 + 48
    frame #12: 0x000000010028b641 Unison`camlUpdate__buildUpdateChildren_2464 + 1025
    frame #13: 0x000000010028c851 Unison`camlUpdate__buildUpdateRec_2465 + 2065
    frame #14: 0x000000010028bbb4 Unison`camlUpdate__handleChild_2502 + 516
    frame #15: 0x000000010028bfe8 Unison`camlUpdate__fun_5485 + 24
    frame #16: 0x0000000100306680 Unison`camlStdlib__List__iter_507 + 48
    frame #17: 0x000000010028b641 Unison`camlUpdate__buildUpdateChildren_2464 + 1025
    frame #18: 0x000000010028c851 Unison`camlUpdate__buildUpdateRec_2465 + 2065
    frame #19: 0x000000010028bbb4 Unison`camlUpdate__handleChild_2502 + 516
    frame #20: 0x000000010028bfe8 Unison`camlUpdate__fun_5485 + 24
    frame #21: 0x0000000100306680 Unison`camlStdlib__List__iter_507 + 48
    frame #22: 0x000000010028b641 Unison`camlUpdate__buildUpdateChildren_2464 + 1025
    frame #23: 0x000000010028c851 Unison`camlUpdate__buildUpdateRec_2465 + 2065
    frame #24: 0x000000010028cdd0 Unison`camlUpdate__buildUpdatePathTree_2572 + 672
    frame #25: 0x000000010028d41a Unison`camlUpdate__buildUpdate_2630 + 90
    frame #26: 0x000000010028ee8f Unison`camlUpdate__fun_5814 + 111
    frame #27: 0x00000001002dd86f Unison`camlSafelist__fold_left_416 + 95
    frame #28: 0x000000010028e9fd Unison`camlUpdate__findLocal_2709 + 1421
    frame #29: 0x000000010028f15a Unison`camlUpdate__fun_5845 + 26
    frame #30: 0x0000000100297b86 Unison`camlGlobals__fun_1327 + 70
    frame #31: 0x00000001002cfa93 Unison`camlLwt__apply_356 + 35
    frame #32: 0x00000001002cfdac Unison`camlLwt__fun_547 + 92
    frame #33: 0x0000000100306680 Unison`camlStdlib__List__iter_507 + 48
    frame #34: 0x00000001002cf86a Unison`camlLwt__restart_281 + 138
    frame #35: 0x00000001002cd063 Unison`camlLwt_unix_impl__restart_threads_621 + 307
    frame #36: 0x00000001002cd64c Unison`camlLwt_unix_impl__run_715 + 844
    frame #37: 0x000000010026005c Unison`camlUitext__synchronizeOnce_2443 + 188
    frame #38: 0x0000000100261206 Unison`camlUitext__loop_2880 + 54
    frame #39: 0x0000000100261326 Unison`camlUitext__synchronizeUntilDone_2885 + 134
    frame #40: 0x000000010026268d Unison`camlUitext__start_3008 + 2493
    frame #41: 0x00000001002530dd Unison`camlMain__nonGuiStartup_912 + 141
    frame #42: 0x000000010037dc85 Unison`caml_start_program + 73
    frame #43: 0x0000000100371de5 Unison`caml_callback_exn + 37
    frame #44: 0x00000001002478ac Unison`bridgeThreadWait + 1277
    frame #45: 0x000000010024eef5 Unison`camlUimacbridge__tCode_508 + 21
    frame #46: 0x00000001002f1305 Unison`camlThread__fun_850 + 37
    frame #47: 0x000000010037dc85 Unison`caml_start_program + 73
    frame #48: 0x0000000100371de5 Unison`caml_callback_exn + 37
    frame #49: 0x0000000100352cf2 Unison`caml_thread_start + 114
    frame #50: 0x00007ff8151b71d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #51: 0x00007ff8151b2bd3 libsystem_pthread.dylib`thread_start + 15
  thread #4
    frame #0: 0x00007ff815181282 libsystem_kernel.dylib`__select + 10
    frame #1: 0x0000000100352d5f Unison`caml_thread_tick + 79
    frame #2: 0x00007ff8151b71d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #3: 0x00007ff8151b2bd3 libsystem_pthread.dylib`thread_start + 15
  thread #5
    frame #0: 0x00007ff81517b0ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff8151b7758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x00000001002474be Unison`bridgeThreadWait + 271
    frame #3: 0x000000010024eef5 Unison`camlUimacbridge__tCode_508 + 21
    frame #4: 0x00000001002f1305 Unison`camlThread__fun_850 + 37
    frame #5: 0x000000010037dc85 Unison`caml_start_program + 73
    frame #6: 0x0000000100371de5 Unison`caml_callback_exn + 37
    frame #7: 0x0000000100352cf2 Unison`caml_thread_start + 114
    frame #8: 0x00007ff8151b71d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #9: 0x00007ff8151b2bd3 libsystem_pthread.dylib`thread_start + 15
  thread #6
    frame #0: 0x00007ff81517b0ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff8151b7758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x00000001002474be Unison`bridgeThreadWait + 271
    frame #3: 0x000000010024eef5 Unison`camlUimacbridge__tCode_508 + 21
    frame #4: 0x00000001002f1305 Unison`camlThread__fun_850 + 37
    frame #5: 0x000000010037dc85 Unison`caml_start_program + 73
    frame #6: 0x0000000100371de5 Unison`caml_callback_exn + 37
    frame #7: 0x0000000100352cf2 Unison`caml_thread_start + 114
    frame #8: 0x00007ff8151b71d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #9: 0x00007ff8151b2bd3 libsystem_pthread.dylib`thread_start + 15
(lldb)```
tleedjarv commented 1 year ago

Is this from when the process is hanging? It looks like the updates scanning is still going on and the hash of some file is being calculated.

schwa commented 1 year ago

That's from process I launched unison on and am seeing [server: update] Setting archive for //ungoliant//Users/schwa logged... I can get a dump from remote process too.


Last login: Thu Mar 30 12:19:04 2023 from fe80::cec:da4e:501f:c793%en10
schwa@ungoliant ~ > ps aux | grep Unison
schwa            40846   0.0  0.0 408636112   1488 s002  S+   12:52PM   0:00.01 grep --color=auto Unison
schwa            40815   0.0  0.4 35233420 291936   ??  S    12:40PM   0:10.61 /Users/schwa/Applications/Unison.app/Contents/MacOS/Unison -server __new-rpc-mode
schwa@ungoliant ~ > lldb -p 40815
(lldb) process attach --pid 40815
warning: libobjc.A.dylib is being read from process memory. This indicates that LLDB could not read from the host's in-memory shared cache. This will likely reduce debugging performance.

Process 40815 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00007ff8062eb0ee libsystem_kernel.dylib`__psynch_cvwait + 10
libsystem_kernel.dylib`:
->  0x7ff8062eb0ee <+10>: jae    0x7ff8062eb0f8            ; <+20>
    0x7ff8062eb0f0 <+12>: movq   %rax, %rdi
    0x7ff8062eb0f3 <+15>: jmp    0x7ff8062e8cdb            ; cerror_nocancel
    0x7ff8062eb0f8 <+20>: retq
Target 0: (Unison) stopped.
Executable module set to "/Users/schwa/Applications/Unison.app/Contents/MacOS/Unison".
Architecture set to: x86_64-apple-macosx-.
(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007ff8062eb0ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff806327758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x000000010040ab19 Unison`_passCall + 101
    frame #3: 0x000000010040ac2e Unison`ocamlCall + 133
    frame #4: 0x0000000100401c5c Unison`main + 304
    frame #5: 0x0000000100401b24 Unison`start + 52
  thread #2
    frame #0: 0x00007ff8062e9c9e libsystem_kernel.dylib`__workq_kernreturn + 10
    frame #1: 0x00007ff806323cb9 libsystem_pthread.dylib`_pthread_wqthread + 427
    frame #2: 0x00007ff806322bbf libsystem_pthread.dylib`start_wqthread + 15
  thread #3
    frame #0: 0x00007ff8062eb0ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff806327758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x00000001005165a2 Unison`caml_thread_join + 130
    frame #3: 0x0000000100411ed4 Unison`camlUimacbridge__callbackThreadCreate_506 + 68
    frame #4: 0x0000000100540c85 Unison`caml_start_program + 73
    frame #5: 0x0000000100534de5 Unison`caml_callback_exn + 37
    frame #6: 0x000000010040a1e7 Unison`-[Bridge _ocamlStartup:] + 129
    frame #7: 0x00007ff8071e47b3 Foundation`__NSThread__start__ + 1009
    frame #8: 0x00007ff8063271d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #9: 0x00007ff806322bd3 libsystem_pthread.dylib`thread_start + 15
  thread #4
    frame #0: 0x00007ff8062f1282 libsystem_kernel.dylib`__select + 10
    frame #1: 0x0000000100515d5f Unison`caml_thread_tick + 79
    frame #2: 0x00007ff8063271d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #3: 0x00007ff806322bd3 libsystem_pthread.dylib`thread_start + 15
  thread #5
    frame #0: 0x00007ff8062eb0ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff806327758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x000000010040a4be Unison`bridgeThreadWait + 271
    frame #3: 0x0000000100411ef5 Unison`camlUimacbridge__tCode_508 + 21
    frame #4: 0x00000001004b4305 Unison`camlThread__fun_850 + 37
    frame #5: 0x0000000100540c85 Unison`caml_start_program + 73
    frame #6: 0x0000000100534de5 Unison`caml_callback_exn + 37
    frame #7: 0x0000000100515cf2 Unison`caml_thread_start + 114
    frame #8: 0x00007ff8063271d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #9: 0x00007ff806322bd3 libsystem_pthread.dylib`thread_start + 15
  thread #6
    frame #0: 0x00007ff8062f1282 libsystem_kernel.dylib`__select + 10
    frame #1: 0x0000000100511aac Unison`unix_select + 812
    frame #2: 0x00000001004904f3 Unison`camlLwt_unix_impl__run_715 + 499
    frame #3: 0x0000000100415f38 Unison`camlMain__fun_1057 + 24
    frame #4: 0x0000000100415773 Unison`camlMain__catch_all_441 + 51
    frame #5: 0x0000000100415d52 Unison`camlMain__init_599 + 1218
    frame #6: 0x000000010041605e Unison`camlMain__nonGuiStartup_912 + 14
    frame #7: 0x0000000100540c85 Unison`caml_start_program + 73
    frame #8: 0x0000000100534de5 Unison`caml_callback_exn + 37
    frame #9: 0x000000010040a8ac Unison`bridgeThreadWait + 1277
    frame #10: 0x0000000100411ef5 Unison`camlUimacbridge__tCode_508 + 21
    frame #11: 0x00000001004b4305 Unison`camlThread__fun_850 + 37
    frame #12: 0x0000000100540c85 Unison`caml_start_program + 73
    frame #13: 0x0000000100534de5 Unison`caml_callback_exn + 37
    frame #14: 0x0000000100515cf2 Unison`caml_thread_start + 114
    frame #15: 0x00007ff8063271d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #16: 0x00007ff806322bd3 libsystem_pthread.dylib`thread_start + 15
  thread #7
    frame #0: 0x00007ff8062eb0ee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff806327758 libsystem_pthread.dylib`_pthread_cond_wait + 1242
    frame #2: 0x000000010040a4be Unison`bridgeThreadWait + 271
    frame #3: 0x0000000100411ef5 Unison`camlUimacbridge__tCode_508 + 21
    frame #4: 0x00000001004b4305 Unison`camlThread__fun_850 + 37
    frame #5: 0x0000000100540c85 Unison`caml_start_program + 73
    frame #6: 0x0000000100534de5 Unison`caml_callback_exn + 37
    frame #7: 0x0000000100515cf2 Unison`caml_thread_start + 114
    frame #8: 0x00007ff8063271d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #9: 0x00007ff806322bd3 libsystem_pthread.dylib`thread_start + 15
(lldb)```
tleedjarv commented 1 year ago

I can get a dump from remote process too.

Please do.

Edit: Silly me, I guess that's exactly what you did :)

schwa commented 1 year ago

:-)

The local end seems to be getting stuck on the first file. It was a .git-credentials files - (temporarily) erasing that it then gets stuck on the next first file.

It's only this end though ('archimedes') the other end ('ungoliant') successfully builds its own index.

Both machines are M1 Macs running the latest Unison and macOS. A bit weird :-)

tleedjarv commented 1 year ago

The local end seems to be getting stuck on the first file. It was a .git-credentials files - (temporarily) erasing that it then gets stuck on the next first file.

Ok, so that matches what we're seeing in the backtrace.

Could you do a small test and run a sync only locally between two throwaway dirs? You could try syncing those same files or a different files but make sure these are fresh dirs (not synced before).

schwa commented 1 year ago

Very trivial tests seem to work just fine. Seems like it may be something unique to my profiles. I'll see if I can isolate what's going on.

tleedjarv commented 1 year ago

Very trivial tests seem to work just fine.

Also trivial tests over ssh?

schwa commented 1 year ago

So far, yes SSH and local are fine. But not finding anything yet in settings. Will report back if/when I find something.

gdt commented 1 year ago

Also, to take emulation out of the picture, build unison with an M1-native ocaml and see if that works.

schwa commented 1 year ago

Still an issue. Have had to modify my day to day work flow to not rely on Unison alas. No closer to discovering the common pattern causing this.

tleedjarv commented 1 year ago

It's most likely an issue with the macOS native GUI. If it's possible for you, could you please try with the GTK GUI or the command line version? You can download these here on the releases page (no installation needed, but the GTK GUI needs GTK installed on your machine). It would be very helpful indeed if we could isolate the issue to being specific to the native GUI.

schwa commented 1 year ago

I am using the command-line version. I've tried and failed with both the GUI and CLI version installed via home-brew (Unison 2.53.2 (ocaml 4.14.1)) and the CLI version of unison I installed from git-hub (also Unison 2.53.2 (ocaml 4.14.1) - I assume home-brew is just vending this). No luck.

tleedjarv commented 1 year ago

We need to be sure which version you're trying with. The native .app also provides a CLI, it is not exactly the same thing as the pure CLI.

If you could extract a backtrace from the CLI version. It is easy to see if it's the real CLI because it will be single-threaded and there would be no mention of Unison.app anywhere. If you see

Unison`somethingsomething

or camlUimacbridge anywhere in the trace, then it's not the pure CLI.

llucps commented 1 year ago

I'm also using Unison 2.53.2 (omcal 4.12.1) on MacOS 13.3.1 synched to a FreeBSD with Unison 2.52.0 (ocaml 4.05.0) and I haven't had any issues so far.

I use a own compiled CLI version so it's ARM native, and also for fun although I don't use it I compiled a GUI version with make UISTYLE=mac

gdt commented 1 year ago

FWIW, I just did a sync with the RC built under pkgsrc on a macOS 10.13 testing box to NetBSD 9 amd64 running 2.53.2 and that worked fine.

I don't know what's going on here, but it's probably going to take debug logging on the failing system. So far it doesn't feel like a 2.53.3 regression so I don't intend to hold the release.

I would recommend installing native caml and building the RC yourself, so you get a native binary. I am 99% sure the CI binaries are x86_64.

llucps commented 1 year ago

A a side note I use MacPorts to install oCaml and then download Unison source code and compile it. It's very simple and you'll get a native ARM binary for your M1.

gdt commented 11 months ago

Where are we on this? Does this still reproduce with a native unison build from the tip of master? Have you been able to bisect with unison or with macOS? Is there some ktrace/ktruss/strace/dtrace facility to use to watch file operations?

(So far, I can't see that the logs clearly point to a unison bug or a mac bug.)

schwa commented 11 months ago

I gave up to be honest. I couldn't get it working between those two machines and had to stop using Unison.

I've since reinstalled macOS on one of the two machines and Unison seems to be working again. Whatever the issue was - it was very specific to some configuration. However I'm using different work flows now (folders in Resilio Sync) and my usage of Unison has dropped unfortunately. Feel free to close this.

gdt commented 11 months ago

Thanks for the update. Very interesting that a reinstall fixed it. I'm going to close, because a report that isn't clearly a bug and can't be reproduced is not going to lead to progress. Please feel free to open a new issue if you can reproduce bad behavior with up-to-date unison, or to ask for help on the list.

gdt commented 9 months ago

I don't remember why i reopened this. It seems that the OP no longer has the problem, and we have no other reports. So it could really be anything: unison bug, macOS bug, ocaml bug, bitrot. I'm going to close and if someone else has a hang they can open a new issue, as any two hangs are not necessarily related anyway.