bcpierce00 / unison

Unison file synchronizer
GNU General Public License v3.0
4.2k stars 235 forks source link

Add pre/suffix option for temporary files so that other processes/systems will ignore them #367

Open ChristianRiesen opened 4 years ago

ChristianRiesen commented 4 years ago

Unison works great for my uses but I keep running into an annoying situation in my setup. The temporary files unison creates sometimes get picked up by an external process over which I have no control over. However that process ignores files if they start with .~ and a couple other rules. For that it would be great to have an option both for a suffix and a prefix that would be only used with the temporary files. I hope that would not be too complex a change but would make unison more flexibel.

gdt commented 4 years ago

It might also be nice to just change the default to something that is ignored by most things that try to ignore some files. Perhaps ending in ~ is enough?

ChristianRiesen commented 4 years ago

From https://help.dropbox.com/installs-integrations/sync-uploads/files-not-syncing

Temporary files When some applications (such as Microsoft Word, Excel, or PowerPoint) open a file, they will often save a temporary file in the same directory and name it in one of the following ways:

Name begins with ~$ (a tilde and dollar sign) or .~ (a period and tilde) Name begins with a tilde and ends in .tmp, such as ~myfile.tmp Dropbox doesn’t sync these temporary files on any operating system.

As you see there are different options like ~file.tmp or .~file or ~$file for this specific use case. So having the option of prefix/suffix would be ideal, though as you can see only prefix would already work as well, just so Dropbox isn't greedy and tries to grab the temporary unison files that can be in the folder.

gdt commented 4 years ago

I see. This is not following the usual gitignore pattern (which follows longstanding traditions), so it does seem that having his be configurable would be useful for various people to accomodate various other software. (I don't think aligning unison to some particular proprietary software is a good approach, personally.)

ChristianRiesen commented 4 years ago

I completely agree with you. The flexibility to select the prefix and suffix for temporary files would already make it fit for pretty much any use case.

ChristianRiesen commented 4 years ago

Ran into this issue now about a dozen times since posting this. I resolved to have a script that I can trigger by hand, searching for unison files and removing them. Less than ideal though. Anything I can do to get this done? :)

gdt commented 4 years ago

Yes, you can make a code change and test it, and then submit a pull request with that code change and the corresponding docs change, all while being careful to do it in a way that will seem acceptable to most. Probably not the answer you wanted, but that's how it is.

ChristianRiesen commented 4 years ago

I can code, but not this language unfortunately :) Might give it a go anyways and see how far I get.

tleedjarv commented 4 years ago

If you don't need to make it configurable then have a look here https://github.com/bcpierce00/unison/blob/master/src/os.ml#L42-L43

ChristianRiesen commented 4 years ago

Thank you very much @tleedjarv . I might do that temporary. I do want to give back a little for such an awesome tool though, so maybe I can make this happen somehow :)

tleedjarv commented 4 years ago

I can create a PR if you wish but then it needs testing to see if it meets your expectations and doesn't break any previous expectations.

The easiest change is to add tilde on line 42. Slightly more complex, I can add tilde only on non-Windows platforms. What is correct here?

ChristianRiesen commented 4 years ago

I run this on a Linux box. Really I could just add that to the lines and compile it, that wouldn't be that hard for me to do.

It's either tilde and dollar sign ~$, or dot and tilde .~ as a prefix which would be ignored. Considering how *nix likes its files, the dot tilde variant seems a better choice. The other one is requiring a prefix and suffix in combination.

As gdt commented above though, it might not be such a good idea to change it globally for everyone. If you choose to do so though I will certainly not tell you to stop, as you are fixing my problem :)

tleedjarv commented 4 years ago

You can try it out and see how it works for you.

If this helps more people, creating a PR is no issue. After all, it can only be an improvement over current state. There must be common enough patterns that are not specific to any particular proprietary software.

It seems that prefixing with tilde is a common Windows pattern (anyone can confirm?).

I'm not sure about other operating systems. If someone can confirm a common pattern then making a PR with different prefix-suffix for Windows and other platforms is no problem at all.

For reference, the current code on all platforms is the same: prefix .unison. and suffix .unison.tmp

Making it configurable by arguments seems too much. Checking for an env variable could be thinkable, if there would be a commonly used one, and would also allow for a different configuration on server and client (disclaimer: not sure the code works this way).

ChristianRiesen commented 4 years ago

Those patterns mentioned in my second comment are common windows used ones, which Dropbox ignores because of that.

gdt commented 4 years ago

I think the basic difficulty is that there are no real standards, and this is about accommodating various other programs that people are using, that have different rules.

On UNIX, . as a prefix doesn't mean "exclude from backup", but rather "don't show in ls without -a". So .unison. is good for that. I'd say that a simple trailing ~ is the normal "this is a backup file", but really there is no norm for "this is a file that is temporary and should not be backed up". It seems the real issue is other programs that watch the fs and sync.

I think it's far more reasonable to hard-code for other open-source sync/backup programs, and more challenging to align to proprietary ones, esp. absent e.g. an XDG spec for this.

Has anyone thought about syncthing?

ChristianRiesen commented 4 years ago

So with .~ it should not make that much of a difference then. And the option to overwrite it would still make it the most flexible for other cases.

ChristianRiesen commented 4 years ago

So, I went ahead and tried it out locally. Changed on https://github.com/bcpierce00/unison/blob/master/src/os.ml#L42 the part ".unison." to ".~unison." and compiled it. After that spent a while testing it. The issue mostly happens with lots of files, more than a few KB in size. I tried it a few times and the sync job completely ignores the .~unison starting files. If simply that change would find its way into the codebase, that would solve my issue and should be safe for others. Otherwise making that prefix on the mentioned line, a configurable thing over the command line would still be an option.