bcpierce00 / unison

Unison file synchronizer
GNU General Public License v3.0
4.01k stars 227 forks source link

Avoid problems with quoting arguments to copyprog #866

Closed rrthomas closed 1 year ago

rrthomas commented 1 year ago

Use Unix.open_processargs{full,in} instead of Unix.openprocess{full,in}, passing command-line arguments as an array, so they are not parsed by the shell and do not need to be quoted.

Add the function runExternalProgramArgs to src/external.ml to implement this.

This is motivated by recent changes to rsync: https://download.samba.org/pub/rsync/NEWS#3.2.4

See also https://github.com/bcpierce00/unison/issues/865

This obviates the need for the copyquoterem preference, which is ignored for backwards compatibility, but its documentation is removed.

gdt commented 1 year ago

Have you tested with old rsync also?

tleedjarv commented 1 year ago

A few quick comments before further review:

gdt commented 1 year ago

This seems conceptually ok to me and I will wait a bit to allow anyone inclined to do a more detailed review.

Don't be alarmed about CI failures as CI is flaky.

Generally we have a commit with real changes and then a "regen strings" commit separately. If that's reasonably easy for you to split out please do.

tleedjarv commented 1 year ago

This is from OCaml's manual re Unix.open_process_args_{full,in}:

The executable file prog is searched in the path. This behaviour changed in 4.12; previously prog was looked up only in the current directory.

If I read this correctly then this is serious and basically makes this PR not work when built with OCaml < 4.12.

rrthomas commented 1 year ago

Thanks very much for the comments so far; some replies:

* I'm not sure about removing `copyquoteterm` entirely. Copyprog could be something other than `rsync` and even though the need to quote may be unlikely, the quoting code is already there so why not keep it? (The default=true for rsync should be removed, though.)

I think that to retain copyquoterem there should be a common use case, at least. It's a hack that works around an implementation problem (which has now been fixed), and it was in rsync, the default copyprog. I don't think it should be Unison's problem to work around similar flaws in other programs (which, as in rsync, are probably in any case security bugs). Fixing Unison not to use the shell

* Instead of doing a full copy for `runExternalProgramArgs`, it's better to make the existing `runExternalProgram` into a universal auxiliary function and then add new `runExternalProgram` and `runExternalProgramArgs` to call the auxiliary func with proper params for args/non-args. (in the end, the plain non-args version could be removed completely?)

Absolutely. I have not done so in the first pass purely because I'm quite new to OCaml, and because I wanted to ensure that the initial implementation was correct.

Generally we have a commit with real changes and then a "regen strings" commit separately. If that's reasonably easy for you to split out please do.

I'll do that!

If I read this correctly then this is serious and basically makes this PR not work when built with OCaml < 4.12.

Good catch, I guess that means this change would have to wait to be merged until the minimum requirement is lifted from the current 4.08 to 4.12?

gdt commented 1 year ago

I think it's going to be a very long time before we required >= 4.12. The change to 4.08 was pretty recent and even that had some gnashing of teeth.

So I guess that means for now probably changing the default, and more importantly enhancing the documentation to make it easier to understand. Right now I think someone coming to this is not going to follow what's going on.

rrthomas commented 1 year ago

Have you tested with old rsync also?

Tested with 3.2.3.

rrthomas commented 1 year ago

I think it's going to be a very long time before we required >= 4.12. The change to 4.08 was pretty recent and even that had some gnashing of teeth.

Sure, I can put a note in my diary for however long that is. OCaml 4.08 was released 3½ years before that change; a conservative 4 years after 4.12 is around 2 years from now.

So I guess that means for now probably changing the default, and more importantly enhancing the documentation to make it easier to understand. Right now I think someone coming to this is not going to follow what's going on.

If the change is made at a time when most users will have a suitable version of rsync, then the only people who will need an explanation are those stuck with old rsync. A big "heads up, if you're using unpatched rsync < 3.2.4, you will need copyquoterem = true"?

gdt commented 1 year ago

It's not really strictly time; it's about arriving at a place where the benefits of dropping the old versions outweigh the pain, and that ends up being about Laggard Term Stable distributions where people expect to build new unison, which we gradually decide not to support. But I have no idea where we'll be in two years.

I didn't mean an explanation of the change. I meant that the existing explanation of quoting rules was not adequately clear. I've been using sh for a really long time and I didn't follow it the first time. Really this is exposing an implementation detail that shouldn't be exposed in the first place. I'll continue in the ticket, since I think this PR is not going to get merged any time soon.

rrthomas commented 1 year ago

It's not really strictly time; it's about arriving at a place where the benefits of dropping the old versions outweigh the pain, and that ends up being about Laggard Term Stable distributions where people expect to build new unison, which we gradually decide not to support. But I have no idea where we'll be in two years.

I don't really mind how long it is. 2 years, 5 years, 10.

I didn't mean an explanation of the change. I meant that the existing explanation of quoting rules was not adequately clear. I've been using sh for a really long time and I didn't follow it the first time. Really this is exposing an implementation detail that shouldn't be exposed in the first place. I'll continue in the ticket, since I think this PR is not going to get merged any time soon.

Sure, I guess it can be tagged "OCaml >= 4.12".

rrthomas commented 1 year ago

Thanks very much for the review, which I'll now work through.

The path problem with OCaml < 4.12 could be solved by doing own path search. This way the code could be merged even before the minimum becomes 4.12. Not sure if it's worth the effort, though.

I agree: it's not worth the effort. This is a nice improvement to have when it's available, but it doesn't seem to be a huge problem in practice.

rrthomas commented 1 year ago
* for OCaml 4.12 the path search code (like in fswatch.ml) must be added (it should be clearly commented as being for compat only and must be removed once 4.12 becomes the minimum)

I've added the code from fswatch.ml as highlighted above, also checking the case that the command name contains /, in which case we don't check PATH (exactly as for execvpe).

(the path search should be done only once, not for every synced file)

Hmm, I'm not sure how to do this safely. I have added calls to search_in_path in openProcessArgs{In,Full}, which will indeed be called once per file. But since this will happen anyway with open_process_args_in/full in OCaml >= 4.12, it seems OK to do it once per file, and in the mean time any other callers of openProcessArgs* will work.

* the patch must be tested in Windows (even though I doubt there are many rsync users on Windows, but we can't allow a regression here)

I looked at doing this, but I see Cygwin is needed for building, so gave up (I do build and test other packages regularly on Mingw-64 using MSYS2).

Since we no longer require OCaml 4.12, I've removed that tag from the issue title.

tleedjarv commented 1 year ago

(the path search should be done only once, not for every synced file)

Hmm, I'm not sure how to do this safely. I have added calls to search_in_path in openProcessArgs{In,Full}, which will indeed be called once per file. But since this will happen anyway with open_process_args_in/full in OCaml >= 4.12, it seems OK to do it once per file, and in the mean time any other callers of openProcessArgs* will work.

I'm not sure about the performance penalty. You're right that this will happen anyway (even with current openProcess which goes through the shell) so maybe it's not worth thinking about this.

* the patch must be tested in Windows (even though I doubt there are many rsync users on Windows, but we can't allow a regression here)

I looked at doing this, but I see Cygwin is needed for building, so gave up (I do build and test other packages regularly on Mingw-64 using MSYS2).

Ah, no worries. Once the CI runs then you can just download the built binaries; no need to build yourself.

gdt commented 1 year ago

I'll look at this after the release happens. We were already close and this is too scary to drop in at the last minute, but I won't feel that way early after a point release.

rrthomas commented 1 year ago

Ah, no worries. Once the CI runs then you can just download the built binaries; no need to build yourself.

I did this, but I couldn't work out how to test the patch, because I couldn't find out how to use rsync with Unison on Windows. I tried using rsync (from msys, for Mingw), but it assumes that files containing a colon are remote, and even if I specify the local path as \foo\bar, unison translates that to C:/foo/bar.

Searching online did not enlighten me; rather, most tutorials for using rsync on Windows use WSL.

tleedjarv commented 1 year ago

I did this, but I couldn't work out how to test the patch, because I couldn't find out how to use rsync with Unison on Windows. I tried using rsync (from msys, for Mingw), but it assumes that files containing a colon are remote, and even if I specify the local path as \foo\bar, unison translates that to C:/foo/bar.

Searching online did not enlighten me; rather, most tutorials for using rsync on Windows use WSL.

Do we conclude from this that this functionality never really worked with Unison on Windows? (or rather that rsync itself doesn't work outside some POSIX-y container like WSL or Cygwin)

Then we have no regression either. That's good enough for me.

gdt commented 1 year ago

I'm ok with merging this, given my understanding:

(merges are on hold for a bit in case of post-release disasters)

rrthomas commented 1 year ago

I'm ok with merging this, given my understanding:

* the code will work on both old and new rsync on unix-y platforms

I have tested this, and confirm that it does.

* with the old code, external rsync doesn't seem to work on Windows, so it's not a regression

I confirm that I couldn't get it to work on Mingw. It should work on Cygwin, but I've not tested that. Cygwin should however be "unixy".