bcpierce00 / unison

Unison file synchronizer
GNU General Public License v3.0
4.12k stars 232 forks source link

Implement lookahead in path specifications #80

Open mlen opened 7 years ago

mlen commented 7 years ago

I use unison to sync home directories of my machines. I'd like to be able to specify such pattern that'll ignore the every directory that contains .git directory inside. Currently it doesn't seem to be possible. The easiest way to implement it is to add PerlRegex and add pcre-ocaml as a dependency (optionally?)

I am willing to implement this myself, but this may take some time.

bcpierce00 commented 6 years ago

There was quite a bit of discussion about this issue years ago. IIRC, the conclusion was that, since people might have many different ideas about what criteria should be applied to decide whether a directory is ignored, it would really be nice to use an external program to decide. E.g., when scanning each filesystem, unison could just execute some given command in each directory and ignore it if the command returned nonzero exit status. This should be quite straightforward to implement, but I'm not sure about how much it will slow down update detection on filesystems with many directories.

schallee commented 6 years ago

Perusing stuff today and thought I'd add a couple of comments here.

The straight regex method would be useful but the external program method would be far more powerful as such choices could be far more complex then just a regex. For example, it could be very useful to ignore build artifacts that are also ignored by a VCS by parsing .gitignore or the like. Adding such for multiple VSCs to unison would be a compatability and maintainability nightmere but fairly trivial to do with an external program or script.

The concern about slowing down updates isn't much of an issue on unixen where fork exec is fairly cheap but would likely be a major slow down on winderz. For small syncs this overhead would not be a concern but would be for very large ones.

What I would recommend is that instaed of execing for each path that the execternal program would be executed once concurrently with the run. A simple text protocol between unison and the external program could be defined allowing unison to query the external program about whether a certain path should be ignored or not. The squid http proxy does url filtering and the like in a similar manner. The protocol could be as simple as writing a line with the path on the external program's STDIN and reading a one or zero on STDOUT. Alternatively, output could be one path per line if it should not be ignored, an empty line or just no line for ignored files. The latter would allow unix pileline like filters to be used rendering mlen's pcre just an application of a pcre grep.

For smaller syncs a helper script/executable could be provided that does call an external program for each path.

There are some gotcha's here to be aware of: line endings and line ending characters in file names (which are fairly frequent on osx).

g-raud commented 6 years ago

This is possible with a wrapper around unison that will scan local and remote roots for directories to ignore, then generate an ignore profile to be included by the main profile.

To make sure that the file is ignored, negative patterns (#155) can be used to unset ignorenot. To avoid having to escape globs or regexes, fixed string patterns (#165) can help (as well as a parsing supporting a map separator in the patterns #181 for more robustness). To avoid having to modify the original profile, use command line include options (#171).

Example wrapper script (not tested):

#!/bin/sh
# unison-ignore-git.sh
#Synopsis
#   unison-ignore-git.sh <root> <remote> <mode> <unison_args>

set -euf
if (set -o pipefail 2>/dev/null)
then set -o pipefail
fi

list () {
    find "$root" -name .git
}

root=$1; shift
remote=$1; shift
mode=$1; shift
case $mode in list) list; exit ;; esac

ignores=`(list; ssh "$remote" unison-ignore-git.sh "$root" null list) |sort -u`
  # "$root" should be escaped if it contains special chars

OLDIFS=$IFS
IFS='
'
exec 3>&1 >"$HOME"/.unison/ignore-git.prf
for f in $ignores
do
    f=${f#"$root"}; f=${f#/}
    f=${f%/.git}
    printf "ignore = String '%s' ->\\n" "$f"
    printf "ignorenot = del String '%s' ->\\n" "$f"
done
exec 1>&3 3>&-
IFS=$OLDIFS

unison "$@" -source "$HOME"/.unison/ignore-git.prf