GitoxideLabs / gitoxide

An idiomatic, lean, fast & safe pure Rust implementation of Git
Apache License 2.0
8.91k stars 303 forks source link

The kstring integration in gix-attributes is unsound #1460

Closed ssbr closed 2 months ago

ssbr commented 2 months ago

Current behavior 😯

gix-attributes (in state::ValueRef) unsafely creates a &str from a &[u8] containing non-UTF8 data, with the justification that so long as nothing reads the &str and relies on it being UTF-8 in the &str, there is no UB:

// SAFETY: our API makes accessing that value as `str` impossible, so illformed UTF8 is never exposed as such.

The problem is that the non-UTF8 str is exposed to outside code: first to the kstring crate itself, which requires UTF-8 in its documentation and may have UB as a consequence of this, but also to serde, where it propagates to e.g. serde_json, serde_yaml, etc., where the same problems occur.

As far as I know, this is not sound, and either is or can cause UB down the line in these places that can view the &str.

Expected behavior 🤔

I think gix-attributes should probably use a Vec<u8> or smallvec or similar, at least until kstring can support bytes.

Absolute worst case, one could at least add an unsafe feature, so that code that opts out of unsafe can get the slower Vec<u8>-based implementation which is guaranteed to be sound.

Git behavior

N/A

Steps to reproduce 🕹

N/A

Byron commented 2 months ago

Thanks for bringing this up!

It was quite a hack to begin with and I don't recall why it had to be kstring here. And even though small-string optimisation should be possible, it definitely shouldn't be traded for safety.

Edit: I looked into this more and the idea was that it parses as zero-copy, but then typically converts to owned instances at some point. This is definitely where non-UTF8 could be exposed to places where UTF8 is expected.

Byron commented 2 months ago

It's worth noting that without kstring, there is a 5% performance loss for an attribute-matching heavy workload. For now I replaced it with BString, but hope to get to try a SmallVec implementation as well.

❯ hyperfine  "/Users/byron/dev/github.com/Byron/gitoxide/target/release/gix index entries ':(attr:export-ignore)' -i" "gix index entries ':(attr:export-ignore)' -i"
Benchmark 1: /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix index entries ':(attr:export-ignore)' -i
  Time (mean ± σ):     820.1 ms ±  71.7 ms    [User: 759.9 ms, System: 85.9 ms]
  Range (min … max):   778.8 ms … 984.3 ms    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: gix index entries ':(attr:export-ignore)' -i
  Time (mean ± σ):     782.1 ms ±   1.8 ms    [User: 760.1 ms, System: 43.2 ms]
  Range (min … max):   780.0 ms … 786.0 ms    10 runs

Summary
  gix index entries ':(attr:export-ignore)' -i ran
    1.05 ± 0.09 times faster than /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix index entries ':(attr:export-ignore)' -i
EliahKagan commented 2 months ago

I recommend that an informational RUSTSEC advisory be created to document that gix-attributes was unsound--as described in this issue--prior to version 0.22.3.

This will help to notify interested users that they may wish to upgrade, and is part of what the RUSTSEC database tracks:

RustSec also tracks soundness issues as informational advisories, independent of whether they are vulnerabilities or not. A soundness issue arises when using a crate from safe code can cause Undefined Behavior.

This can be done by opening a pull request on the advisory-db repository in accordance with these instructions (which apply to informational advisories as well as those documenting vulnerabilities).

@ssbr I would be happy to contribute this. But I wanted to check with you first in case you want to do it. If not, I'll go ahead and open a PR and, unless you think it should be done differently, I would use text you wrote in this issue description and credit you in a Co-authored-by: trailer in the commit message.

Byron commented 2 months ago

Thanks for bringing this up, I think it's the right call, too.

I suggest that you go ahead with making the advisory contribution, and @ssbr can collaborate there as/if desired.

EliahKagan commented 2 months ago

I suggest that you go ahead with making the advisory contribution, and @ssbr can collaborate there as/if desired.

I have opened https://github.com/rustsec/advisory-db/pull/2027 for this.