bcpierce00 / unison

Unison file synchronizer
GNU General Public License v3.0
4k stars 227 forks source link

unison fails to propagate deletion if stat fails on (longer) temp file due to excessively long filename #293

Open mitchellnemitz opened 4 years ago

mitchellnemitz commented 4 years ago

When synchronizing directory deletions between systems (in this case macOS 10.14 and CentOS 7), occasionally unison will attempt to call stat on a particularly long filename and then fail to propagate the deletion after that stat call throws an uncaught exception.

For example:

Failed: Error in deleting:
File name too long [stat(/rather/deep/directory/structure/with/some/files/in/cache/like/5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c42)]

The filename is 252 bytes, which is pretty close to the limit of 255, but it is still legal. The directory structure, while rather deep, is also under the 4096 byte limit.

I'm unfamiliar with OCaml and the general architecture of Unison, so it's difficult for me to track down exactly where this call is happening, but if there's a way I can obtain more verbose logging and/or a stack trace to help narrow down where the failure is occurring, please let me know.

Any ideas why this is happening?

gdt commented 3 years ago

I would suggest using ktrace, strace, ktruss, dtrace or something to look at the syscalls and see if there really is a syscall with too long a name, or if the OS is buggy. If you can provide evidence of unison misbehavior we can keep this open. If you need help please use the list for support.

glondu commented 3 years ago

Note that Unison uses temporary files with "unison.tmp" appended in their name (or such). This behaviour may trigger "File name too long" errors even on filesystems respecting file name size limits but approaching them closely.

mitchellnemitz commented 3 years ago

I wouldn't rule out either OS having a sketchy stat implementation, but it does seem that stat functions just fine on both systems outside of the Unison sync context for a file with a max length name. The error seems to lie in Unison's creation of temporary files with additional characters in the name and a subsequent of use stat on those files, not the originals. I had enabled -terse mode when reported the issue initially, here's a quick test case without that flag.

Creating and stat-ing a max-length file name on macOS:

$ touch 5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc
$ stat 5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc
16777220 61638729 -rw-r--r-- 1 mitchell admin 0 0 "Sep 15 11:42:10 2020" "Sep 15 11:42:10 2020" "Sep 15 11:42:10 2020" "Sep 15 11:42:10 2020" 4096 0 0 5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc

Creating and stat-ing a max-length file name on CentOS 7:

$ touch 5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc
$ stat 5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc
  File: ‘5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc’
  Size: 0           Blocks: 0          IO Block: 4096   regular empty file
Device: fd01h/64769d    Inode: 17294006    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-09-15 11:43:22.157129508 -0400
Modify: 2020-09-15 11:43:22.157129508 -0400
Change: 2020-09-15 11:43:22.157129508 -0400
 Birth: -

But when attempting to delete that same file from either system while running Unison in -repeat watch mode:

$ unison \
    "$HOME/stat-test" \
    "ssh://vagrant@127.0.0.1//mnt/stat-test" \
    -auto \
    -times \
    -rsync \
    -repeat watch \
    -servercmd "sudo /usr/local/bin/unison" \
    -ignore "Name {.Spotlight,.DS_Store,.CFUserTextEncoding,.FBCIndex,.FBCLockFolder,.VolumeIcon.icns,.HSicon}" \
    -sshargs "-p 2222 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -i $(pwd)/.vagrant/machines/default/virtualbox/private_key"

Unison 2.51.2 (ocaml 4.08.1): Contacting server...
Warning: Permanently added '[127.0.0.1]:2222' (ECDSA) to the list of known hosts.
Connected [//astraeus.local//Users/mitchell/stat-test -> //docker-box//mnt/stat-test]
Looking for changes
  Waiting for changes from server
Reconciling changes
Nothing to do: replicas have not changed since last sync.
Looking for changes
  Waiting for changes from server
Reconciling changes
         <---- deleted    5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc
Propagating updates
UNISON 2.51.2 (OCAML 4.08.1) started propagating changes at 11:51:39.42 on 15 Sep 2020
[BGN] Deleting 5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc from /Users/mitchell/stat-test
Failed: Error in deleting:
File name too long [stat(/Users/mitchell/stat-test/._5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc)]
Failed [5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc]: Error in deleting:
File name too long [stat(/Users/mitchell/stat-test/._5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc)]
UNISON 2.51.2 (OCAML 4.08.1) finished propagating changes at 11:51:39.42 on 15 Sep 2020
Saving synchronizer state
Synchronization incomplete at 11:51:39  (0 items transferred, 0 skipped, 1 failed)
  failed: 5a5b4a617a7a5c4261636b67726f756e64436865636b536572766963655c42756e646c655c4261636b67726f756e64436865636b42756e646c655c436f6e74726f6c6c65725c496e746567726174696f6e436f6e74726f6c6c65722364656c657465496e746567726174696f6e416374696f6e5d5b315d5b4a617a7a5c4.abc
Looking for changes
  Waiting for changes from server
Reconciling changes
Nothing to do: replicas have been changed only in identical ways since last sync.

As you can see from the more explicit logging here, a temp file of some sort is getting created with a name that is too long for either OS to handle, and thus any attempt to stat that file fails which causes the propagation of the delete to fail.

If there's anything else I can provide to better explain the issue, please let me know.

Thank you!

gdt commented 3 years ago

You've now explained what happens, thanks. The next question is what ought to be done in the code instead.

There is arguably a bug that the error isn't caught with some more graceful handling, and arguably a bigger bug that unison should find some temp filename that is within the rules. (Unison's approach to all of this seems like a heuristic anyway.)

I've adjusted labels, so I think this now just waits until someone writes code.

tleedjarv commented 2 years ago

If you do not want to sync resource forks and FinderInfo then you can add -rsrc=false command line argument. This will solve the specific issue you are seeing in https://github.com/bcpierce00/unison/issues/293#issuecomment-692812640 but it will of course not do anything about the general issue mentioned in https://github.com/bcpierce00/unison/issues/293#issuecomment-692527772

Edit: There is also a simple patch that would eliminate the specific error you're seeing even if you want to continue syncing resource forks. This patch would only silence a very specific corner case but doesn't fix the issue of too long temp names.

If anyone's interested, this is the patch (completely untested!):

diff --git a/src/os.ml b/src/os.ml
index 48fbb110..857bd539 100644
--- a/src/os.ml
+++ b/src/os.ml
@@ -182,7 +182,10 @@ and delete fspath path =
           Fs.unlink absolutePath;
           if Prefs.read Osx.rsrc then begin
             let pathDouble = Fspath.appleDouble absolutePath in
-            if Fs.file_exists pathDouble then
+            let doubleExists =
+              try Fs.file_exists pathDouble with
+              | Unix.Unix_error (Unix.ENAMETOOLONG, _, _) -> false in
+            if doubleExists then
               Fs.unlink pathDouble
           end
       | `SYMLINK ->