GitoxideLabs / gitoxide

An idiomatic, lean, fast & safe pure Rust implementation of Git
Apache License 2.0
8.91k stars 303 forks source link

Don't assume program files folder locations #1456

Closed EliahKagan closed 2 months ago

EliahKagan commented 2 months ago

This pull request fixed CVE-2024-40644 (RUSTSEC-2024-0355, https://github.com/Byron/gitoxide/security/advisories/GHSA-mgvv-9p9g-3jv4).

I've updated this description with a bunch of details, but the most important information is in the advisories. The sections below summarize the bug from a security perspective, give the original PR description that addressed its non-security aspects, and present a lightly edited version of most of the supplementary material that had accompanied the advisory.

Summary of the bug this PR fixes

As presented in more detail in the advisories, gix-path sought Git for Windows in some hard-coded locations when not found in a PATH search. The hard-coded locations were intended as the 64-bit and 32-bit program files directories. But this gave rise to a CWE-427 vulnerability, because the exact paths of those directories are not the same on all systems, and because limited (non-administrator) users can create new directories in the root of the system drive, it was possible to create them.

In practice this mostly affected 32-bit Windows systems where the 32-bit program files directory is usually C:\Program Files and a directory C:\Program Files (x86) would not exist. This allowed a limited user to create it and place arbitrary code under it masquerading as a Git for Windows installation; then other users who ran gix-path would run that executable, causing code execution and privilege escalation. This is comparable to CVE-2022-24765.

Original PR description

The original PR description did not cover security aspects of the bug. It is as follows.

In #1419, gix-path facilities got the ability to check Program Files directories as alternative locations for when Git for Windows is installed but the git binary isn't in the PATH. However, it hard-codes the paths, which are not correct for all Windows systems.

The most common scenario in which the currently hard-coded paths are not correct is a 32-bit Windows system, for which the 32-bit Program Files directory is the only Program Files directory and is (it least in all ordinary setups) just called Program Files. It is also technically possible, though in practice very uncommon, for the system drive not to be C: and, in extremely weird setups, maybe even for Program Files directories to have different names.

This obtains them through three specially named environment variables, which is one of the reasonably reliable and documented means that works for all combinations of process architecture (i.e. build target) and architecture of the system, as well as if they are somehow in unusual places. Because locating these directories carries potentially surprising subtleties, I've included extensive tests, including an end-to-end style test that uses a combination of other techniques to obtain them that is actually more reliable, but that would require new production dependencies, or nontrivial unsafe code in production, to do outside the tests.

I don't think CI covers the cases most likely to go wrong here, but I have tested this on a 32-bit x86 Windows 10 system, on a 64-bit x64 Windows 10 system with both 64-bit and 32-bit builds, and an ARM64 (AArch64) Windows 11 system with 64-bit ARM, 64-bit x64, and 32-bit x86 builds, both manually and by running all tests for gix-path, including the new tests I added.

Supplementary material

This is most of the supplementary material I included with the advisory. In a few areas it is lightly edited. It is of much less interest than the advisory. Portions that would be of no interest at all, such as those relating to coordination or copyediting of the advisory text, are omitted. Portions consisting of code that I was proposing are likewise omitted other than my descriptions and commentary on that code, since the code is in the pull request.

Some diagnostically useful payloads

To confirm that any executable can be run rather than only those with digital signatures or otherwise marked as trusted, and even more so to make sure I understood what git is being called to do, I made two simple diagnostic utilities that I used as alternative payloads to calc.exe:

Either of these can be renamed to git.exe and placed as described in the advisory.

Possible fixes for the vulnerability

This vulnerability was introduced a few weeks ago in #1419, due to the hard-coded paths in the crate-internal gix_path::env::git::ALTERNATIVE_LOCATIONS static item. Ways it could be fixed that I think are reasonable include:

  1. Removing that entirely and no longer using any alternative locations.
  2. Finding the program files folder paths by more accurate programmatic means, and continuing to find Git for Windows in essentially the same way.
  3. Finding Git for Windows by querying the Windows registry for information about where it is installed.

I favor way 2, which I think achieves the intent of the current hard-coded paths. This way could also be extended--though it may be better to add this later rather than in the bugfix--to include the user's local program files directory. Git for Windows may exist there either due to being placed there by the installer, or by the user manually extracting the portable version.

Although way 1 has the advantage of simplicity and it can be superseded later, it has the disadvantage that some users may judge that they are not vulnerable and keep using the unpatched version for the alternative locations feature, and some of those users may be mistaken in that judgment.

I have not extensively researched way 3 but I worry it may lead to strange false matches due to installations that the registry still records as installed due to an error during uninstallation, or old Git for Windows installations that users put in weird places and then forget about because they were in weird places. I think the alternative locations should be specific, small in number, and as predictable as possible without sacrificing correctness, which is also in the spirit of the original.

Way 3 should not be confused with getting the program files paths themselves from the registry, which is one possible approach to implementing way 2.

Observing the bug and fix

This bug prevents git.exe from being found in an alternative location on a 32-bit Windows system even in the absence of any exploit, because the path it uses under C:\Program Files is 64-bit specific.

I have run unit tests as much as I can--I cannot get the full test suite to reliably build and run on the 32-bit Windows 10 system I am using for testing, due to build errors in dependencies reporting insufficient memory resources. But all gix-path tests pass on all platforms tested, including that 32-bit Windows 10 system, where I also manually verified that the vulnerability is present without the patch and absent with it. I have checked that the new tests I added fail in the absence of the functionality they are testing for.

On my main x86-64 development system, I ran the full test suite, with both x86_64-pc-windows-msvc and i686-pc-windows-msvc targets. All tests passed on the 64-bit build and there were no new failures on the 32-bit build. I also ran the gix-path tests but not the full test suite on an ARM64 (AArch64) system, with aarch64-pc-windows-msvc as well as x86_64-pc-windows-msvc and i686-pc-windows-msvc (64-bit ARM builds of Windows support those other architectures by emulation). All the gix-path tests passed there too, except that a performance-related test failed due to taking too long on the two targets requiring emulation.

Scope of the fix - current exclusions

There are two changes I considered but decided not to propose as part of the fix, though I think it makes sense to explore them later.

1. Simpler paths could be used

It seems to me that the paths being used are more complicated than necessary.

Git for Windows has target-specific paths to git.exe, which gix-path currently (as well as with my patch) uses:

mingw64\bin\git.exe
mingw32\bin\git.exe

But it also has these paths that do not encode any architecture or toolchain, and this is even in the directory that Git for Windows places in the PATH when installed with default choices:

cmd\git.exe

And there is also this, alongside the cmd and mingw* directories and not to be confused with mingw*\bin:

bin\git.exe

All of these are relative to the top-level installation directory, which is usually named Git. I think at least that cmd directory is always present, and maybe both cmd and bin are.

Currently, the target-specific directory is always called either mingw64 or mingw32, but in the broader MSYS2 world, those are not the only possible target environments. My guess is that when Git for Windows starts offering ARM64 (AArch64) builds, the directory that is currently always called mingw64 or mingw32 will be called clangarm64 in those builds.

The reason I have not implemented this simplification in my patch is that it produces different observable results for the path to git.exe even in ordinary situations, including when that path is obtained through public functions in gix_path::env. Maybe there is some reason to prefer these paths.

However, please note that it should never make a difference for the output of git --exec-path. That gives libexec/git-core under an architecture and target specific path, and outputs the same path whichever of the above mentioned git.exe files is run.

2. Local paths could be added

As briefly mentioned above, there can also be a "local" per-user program files directory. It is most often located at AppData\Local\Programs relative to the user's user profile (i.e. home) directory. It is not created with the user profile directory, but it is is created the first time an application is installed there. For some applications, including Git for Windows, beginning the installation process will create it even before any step that copies files.

Limited users cannot install programs in the system-wide program files directories. So applications that are installed per-user--which administrators may also choose to do for themselves--are usually installed in the user's local program files directory.

It seems to me that, after checking PATH, this directory should ideally be checked first. If a user running an application that uses gix-path has installed Git for Windows in their local program files directory, they presumably prefer that installation to be used over a system-wide installation they may not even control.

Each user has at most one local program files directory, i.e., at most one location officially recognized as such. It may contain a mix of 32-bit and 64-bit applications. Therefore it may make sense to figure out whether to implement this feature at the same time as whether it makes sense to drop the use of architecture-specific paths, though it could be done either way by checking the user's local directory with both architecture-specific subdirectories.

(We can also check both subdirectories everywhere, which is reasonable, but I lean slightly away from it. Additional such directories that are not from Git for Windows--for example if users try copying directories from a separate MSYS2 installation to see if Git for Windows will use it, which is something I have personally tried in the past--may be unsuitable.)

So how do we find the program files directories?

Although the advisory focuses on the common case of default setups, in rare cases the paths may differ from what is typical. Microsoft recommends checking for them rather than hard-coding them, and I think we should.

There is more than one way to check for them. All are secure, as far as I can tell, but they are not equivalent, and some are more reliable than others. Rather than attempting to present the complete details here, I've written two programs that I used experimentally to research how they behave, as well as to figure out and document how to access them from Rust. These programs are heavily commented:

With that said, here's a summary of the four ways:

In my opinion, for the system-wide program files directories (which currently are the only ones we use), we should either use environment variables for all or none. This is because, in situations where a parent process customizes the environment, the effect of consulting environment variables for some of the paths but not others would be extremely strange and difficult to reason about.

Further notes on environment variables

A summary of the rules for how they are set

On a 64-bit system, the ProgramFiles environment variable is inherited from an architecture-specific environment variable, rather than from the same-named environment variable. This way, a child process gets a ProgramFiles variable suitable to its own architecture, rather than its parent's, at least so long as the parent process doesn't remove those architecture-specific variables. If the relevant architecture-specific variable is not passed down, then ProgramFiles is inherited from ProgramFiles itself. On a 32-bit Windows system, it is always inherited from the same-named variable and the other variables are not significant.

The architecture-specific environment variables are ProgramW6432, ProgramFiles(x86), and ProgramFiles(ARM). The latter is less relevant because it is for 32-bit ARM programs and Rust has no 32-bit ARM targets for Windows (at least via rustup), and it does not need to be examined in gix-path. It is not quite totally irrelevant, though, since the parent process of a process built with gix-path can be a 32-bit ARM process. As for 64-bit ARM, that has the same program files directory as x86-64.

Experimental confirmation and eludication

The Microsoft documentation is somewhat vague about how this works, so I made cross-platform utilities useenv and prenv similar to the env and printenv Unix commands, respectively. useenv, like env, recognizes -i to unset all non-specified variables and -u to unset the single variable whose name is its operand. On an x86-64 system, with useenv and prenv built for the target architectures indicated by their modified names:

C:\Users\ek> useenv64 -i ProgramFiles=A 'ProgramFiles(x86)=B' ProgramW6432=C prenv64
ProgramFiles=C
ProgramFiles(x86)=B
ProgramW6432=C
C:\Users\ek> useenv64 -i ProgramFiles=A 'ProgramFiles(x86)=B' ProgramW6432=C prenv32
ProgramFiles=B
ProgramFiles(x86)=B
ProgramW6432=C
C:\Users\ek> useenv32 -i ProgramFiles=A 'ProgramFiles(x86)=B' ProgramW6432=C prenv64
ProgramFiles=C
ProgramFiles(x86)=B
ProgramW6432=C
C:\Users\ek> useenv32 -i ProgramFiles=A 'ProgramFiles(x86)=B' ProgramW6432=C prenv32
ProgramFiles=B
ProgramFiles(x86)=B
ProgramW6432=C

If a parent process passes none of those through, we get none. If it passes ProgramFiles but not the others, which is an easy mistake to make if attempting to produce a sanitized environment, we get that for ProgramFiles and learn of at most one program files directory, which may not even be the one that corresponds to our process architecture.

Subtle failure modes found in the wild

Some stranger situations are also plausible: suppose a parent process is 64-bit and passes only ProgramFiles and ProgramFiles(x86) through, and we are a 32-bit process:

C:\Users\ek> useenv64 -u ProgramW6432 prenv32 ProgramFiles 'ProgramFiles(x86)' ProgramW6432
ProgramFiles=C:\Program Files (x86)
ProgramFiles(x86)=C:\Program Files (x86)

So environment variables are of limited reliability...

But the case for environment variables remains strong

...Yet they also have significant advantages:

My ARMv7 attempts

Because the parent could be a 32-bit ARM process but there's no Rust target for that on Windows, I also wrote a wenv tool in C, related to prenv, and was going to write another one related to useenv.

I didn't get thar far. The only interesting thing about wenv is that I cross-compiled it for a 32-bit ARM target. The build succeeded, but I was unable to run it, due to missing DLLs. It seems I don't really know how to make a working 32-bit ARM program on Windows, even though my ARM64 VM does have 32-bit ARM alternates for some programs such as WordPad.

Hopefully I'll be able to figure that out eventually, but I don't think it's a blocker here.

To detect or not detect the system architecture

If we use environment variables, there is the question of whether to always look at all of them, or to look at all of them when the system is 64-bit but only at ProgramFiles when the system is 32-bit. Both approaches fix the bug: on a 32-bit system, the others would presumably be absent, and if present it is presumably intentional.

I am not sure what is best, but I favor having the same behavior across systems, and thus not programmatically checking the system architecture. This is also simpler. Also, to the best of my knowledge, the only ways to check are with unsafe code or by adding another library dependency.

However, the tests still have to check, at least for an end-to-end style test that compares the situation on the running system with the behavior of the code under test. The traditional approach still works. I was concerned this technique might report an x86-64 process running by emulation on an ARM64 Windows system as 32-bit, and some old documentation I was reading seemed to suggest this. I wrote the very simple program wowreally to check, and confirmed that this does not happen; when both are 64-bit, it is not considered to be "Windows on Windows" emulation.

My view

In my opinion, detecting the program files directories' locations, even though it intuitively feels like it should be simple, is very easy to get wrong, and if done should be robustly tested. Yet I think it is worthwhile to do, because:

Regarding that last point, I think this approach is beneficial even if it is to be rejected, because it can offer insight into which approach ought ultimately to be used. If the implementation and a test helper both retrieve the information from the system but in independent ways, that may give a sense of what it would be like to do it the other way.

The comparison will not be perfect, because in the test setup expect should be used to assert that the test has the information it needs, while in the code under test any errors finding program files paths should simply lead to fewer alternative locations. But I think it is nonetheless illustrative.

What I did

Because environment variables do not require additional unsafe code or new dependencies in production, and support not just integration-style tests but also unit-style tests mocking var_os, I implemented ALTERNATIVE_LOCATIONS that way.

To form test system specific expected values in the integration-style test, I used known folders except for the 64-bit path, for which I used the registry.

  1. Because the alternative locations are now to be obtained from the system, the element type of ALTERNATIVE_LOCATIONS should be some OsString-based type. Since these are paths, I used PathBuf, which also accords with how ALTERNATIVE_LOCATIONS is used everywhere, and slightly simplifies some uses of it.

  2. The meat of the implementation is in the locations_under_program_files function that uses dependency injection so that unit-style tests can pass stub implementations.

  3. I put all the tests related to alternative locations, and their helpers, in a nested tests::loc module [note: since renamed to tests::locations].

  4. The unit-style tests have helper macros used to form os_env-like stubs that are really maps in function form, and to decrease duplication across assertions.

  5. The unit-style tests are divided into locations_under_program_files_ordinary and locations_under_program_files_strange.

  6. The integration-style test delegates most of its work to its helpers, which are more involved. This is partly because this includes obtaining the information about program files directory locations in an independent way.

    The design I chose was to make assertions about those paths, and then make sure that those paths are the ones that appear as prefixes in the alternative locations produced by the code under test.

  7. First there are helpers, mainly PlatformArchitecture that will be used in assertions that vary by system architecture, which unlike the process architecture is not known at compile-time.

  8. The ProgramFilesPaths struct is tasked with obtaining and representing the paths that we get independently without the use of environment variables so we can use them as expected values, and checking that they are themselves reasonable.

  9. The RelativeGitBinPaths struct contains the core assertion logic for the integration-style test, making direct claims about what the code under test produces. Its name may benefit from refinement, or maybe the awkwardness of the name is a sign that it should just be two helper functions.

    Its assert_from constructor checks that paths obtained from the code under test, i.e. by reading ALTERNATIVE_LOCATIONS, have the same program files directories as their path prefixes and strips them to obtain the suffixes. Then the assert_architectures method on the resulting object asserts that the suffixes are correct.

  10. Then the test function for the integration-style test, named alternative_locations, puts it together, constructing a ProgramFilesPaths via obtain_envlessly() and calling validate() to check it, then using it to test ALTERNATIVE_LOCATIONS by calling RelativeGitPaths::assert_from() to check the matching program files paths and extract the suffices, and calling assert_architectures() to check the extracted suffixes.

  11. The foregoing is all Windows-specific. The #[cfg(not(windows))] test code has no helpers and is much shorter, consisting only of the non-Windows alternative_locations test asserting that ALTERNATIVE_LOCATIONS is empty.

EliahKagan commented 2 months ago

It's unfortunate that I made the mistake not creating a new commit after the module changes, so my edits aren't easily visible. However, I made comments with highlights.

Thanks! It is a big improvement splitting the module into its implementation and tests and putting the files side-by-side in this way. I guess the correct organization for this code was obvious, but for whatever reason I didn't see it.

Regarding the change being in just one commit, another commit for that could be rebased in if desired, but I think it should not be necessary.