Byron / gitoxide

An idiomatic, lean, fast & safe pure Rust implementation of Git
Apache License 2.0
8.84k stars 301 forks source link

An old value of `MSYS` with unpaired surrogates is silently replaced for fixtures #1574

Closed EliahKagan closed 3 weeks ago

EliahKagan commented 3 weeks ago

Current behavior 😯

In gix-testtools, when test fixture scripts are run, the configure_command function customizes their environment. Since 0899c2e (#1444), this includes setting the MSYS variable to make it so that commands in shell scripts such as ln -s will really attempt to create symlinks rather than copies.

If MSYS was already set, the options held in its old value are kept, with the new option appended at the end so it will only suppress an option that conflicts with it, otherwise leaving the effects of preceding options in place.

But this does not happen if the old value of MSYS contained an unpaired surrogate code point. This is because env::var rather than env::var_os is used, so the unwrap_or_default method returns an empty string both when MSYS was absent and when it was present with a value that could not be represented as well-formed UTF-8.

https://github.com/Byron/gitoxide/blob/9bddeeb948397df3d1e391ec48597cc34d10acd0/tests/tools/src/lib.rs#L596-L597

https://github.com/Byron/gitoxide/blob/9bddeeb948397df3d1e391ec48597cc34d10acd0/tests/tools/src/lib.rs#L605

There are at least three scenarios where MSYS may contain an unpaired surrogate:

The impact of this bug is therefore small. However, fixing it would help clarify the intended semantics of how gix-testtools treats the MSYS environment variable when running test fixture scripts. (This may potentially be relevant to some changes made for #1578.)

Expected behavior 🤔

Due to the nature of gix-testtools as a testing facility, both as part of gitoxide's test suite and when the published crate is used, I think any of these behaviors are fine and should be considered to fix the bug:

  1. Always panic when the previous value of MSYS contained an unpaired surrogate, whether or not this is documented.

  2. Always accept and append to the previous value of MSYS when it contains an unpaired surrogate, such as by using env::var_os and calling push (rather than env::var and calling push_str), whether or not this is documented.

    If this is done, then unwrap_or_default could still be used, because it would only be falling back to the default empty value when the environment variable was absent.

    -    let mut msys_for_git_bash_on_windows = std::env::var("MSYS").unwrap_or_default();
    -    msys_for_git_bash_on_windows.push_str(" winsymlinks:nativestrict");
    +    let mut msys_for_git_bash_on_windows = std::env::var_os("MSYS").unwrap_or_default();
    +    msys_for_git_bash_on_windows.push(" winsymlinks:nativestrict");
  3. Most other behaviors, including the current behavior, if it is documented.

While any of those would, in my opinion, satisfy users' expectations and eliminate the bug as such, I think the best behavior would be to do (1) or (2). Specifically:

I plan to investigate this and open a PR that makes one of those changes. (This investigation has been slightly slowed by the problem described in https://github.com/rust-lang/libz-sys/issues/215, but I should be able to complete it.)

Update: I have opened #1580, which changes it to use OsString instead of String and to append with OsString::push. As noted there, I verified that a value of MSYS produced this way does still have the desired effect on symlink creation.

Git behavior

Not directly applicable.

Steps to reproduce 🕹

The nice way to reproduce this would be to make a fixture that displays the value, or to add a tracing statement, or to use a debugger.

Instead, I just added a panic! to print out whatever MSYS value will be used:

diff --git a/tests/tools/src/lib.rs b/tests/tools/src/lib.rs
index b3c0102ea..127e7cab1 100644
--- a/tests/tools/src/lib.rs
+++ b/tests/tools/src/lib.rs
@@ -595,6 +595,7 @@ fn configure_command<'a>(
 ) -> &'a mut std::process::Command {
     let mut msys_for_git_bash_on_windows = std::env::var("MSYS").unwrap_or_default();
     msys_for_git_bash_on_windows.push_str(" winsymlinks:nativestrict");
+    panic!("DEBUG: {msys_for_git_bash_on_windows:?}"); // FIXME: Remove after debugging.
     cmd.args(args)
         .stdout(std::process::Stdio::piped())
         .stderr(std::process::Stdio::piped())

Then I ran a test that, while conceptually unrelated to this issue, exercises that code, ensuring the panic! is hit. The following output was warnings induced by making that change, about unused variables and unreachable code, removed. The full output can be seen in this gist.

First, a control, with well-formed Unicode:

ek@Glub MINGW64 ~/source/repos/gitoxide/tests/tools (main)
$ MSYS='error_start:nonexistent' cargo nextest run configure_command_clears_external_config

    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.39s
------------
 Nextest run ID e36b5636-bba9-47d5-b9d5-0ae20e5c0f30 with nextest profile: default
    Starting 1 test across 3 binaries (6 tests skipped)
        FAIL [   0.015s] gix-testtools tests::configure_command_clears_external_config

--- STDOUT:              gix-testtools tests::configure_command_clears_external_config ---

running 1 test
test tests::configure_command_clears_external_config ... FAILED

failures:

failures:
    tests::configure_command_clears_external_config

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.01s

--- STDERR:              gix-testtools tests::configure_command_clears_external_config ---
thread 'tests::configure_command_clears_external_config' panicked at tests\tools\src\lib.rs:598:5:
DEBUG: "error_start:nonexistent winsymlinks:nativestrict"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

   Canceling due to test failure
------------
     Summary [   0.017s] 1 test run: 0 passed, 1 failed, 6 skipped
        FAIL [   0.015s] gix-testtools tests::configure_command_clears_external_config
error: test run failed

A now, when the existing value contains an unpaired high surrogate:

ek@Glub MINGW64 ~/source/repos/gitoxide/tests/tools (main)
$ MSYS=$'error_start:\uD800z' cargo nextest run configure_command_clears_external_config

    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.39s
------------
 Nextest run ID bc896e5b-fabd-45b5-8823-4889319cf522 with nextest profile: default
    Starting 1 test across 3 binaries (6 tests skipped)
        FAIL [   0.024s] gix-testtools tests::configure_command_clears_external_config

--- STDOUT:              gix-testtools tests::configure_command_clears_external_config ---

running 1 test
test tests::configure_command_clears_external_config ... FAILED

failures:

failures:
    tests::configure_command_clears_external_config

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.02s

--- STDERR:              gix-testtools tests::configure_command_clears_external_config ---
thread 'tests::configure_command_clears_external_config' panicked at tests\tools\src\lib.rs:598:5:
DEBUG: " winsymlinks:nativestrict"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

   Canceling due to test failure
------------
     Summary [   0.026s] 1 test run: 0 passed, 1 failed, 6 skipped
        FAIL [   0.024s] gix-testtools tests::configure_command_clears_external_config
error: test run failed

Therefore, normally it is able to keep content from the old value:

DEBUG: "error_start:nonexistent winsymlinks:nativestrict"

But when it cannot be converted to UTF-8, its silently discards it:

DEBUG: " winsymlinks:nativestrict"

(The reason I added the z after the unpaired surrogate was to show a situation where it is not at the very end, which is the more common situation. The problem is not limited to when it is at the very end; rather, whether appending ought to be done, rather than panicking, should, I think, be determined by whether program built against the MSYS DLL are able to heed the appended winsymlinks option even when it the space that delimits it follows an unpaired high surrogate.)