RustCrypto / hashes

Collection of cryptographic hash functions written in pure Rust
1.75k stars 238 forks source link

sha1: implement collision detection and mitigation #566

Closed dignifiedquire closed 3 months ago

dignifiedquire commented 4 months ago

This implements sha1 collision detection, including rehashing for mitigation. As the code is 1-1 based on the version that git uses, the mitigation hashes should match as well.

Closes #204

Open TODOs

Limitations

Can only be used with the pure software implementation, asfaiu. The reason for this is, that the algorithm needs access to all intermediary states, and so processing 4 rounds at once through the intrinsics will screw things up. For that reason I have made it it's own implementation, instead of adapting the existing compress implementations.

It might be possible to add support for the "simpler" assembly implementations that do round for round processing, but I think this could be a follow up in the future, if this is too slow for these platforms.

Prior art

dignifiedquire commented 4 months ago

Benchmarks on my macbook (arm M1), using the defaults, so both detection and mitigation enabled.

test sha1_10              ... bench:          12 ns/iter (+/- 0) = 833 MB/s
test sha1_100             ... bench:         107 ns/iter (+/- 3) = 934 MB/s
test sha1_1000            ... bench:       1,055 ns/iter (+/- 48) = 947 MB/s
test sha1_10000           ... bench:      10,521 ns/iter (+/- 659) = 950 MB/s
test sha1_collision_10    ... bench:          19 ns/iter (+/- 1) = 526 MB/s
test sha1_collision_100   ... bench:         180 ns/iter (+/- 6) = 555 MB/s
test sha1_collision_1000  ... bench:       1,660 ns/iter (+/- 55) = 602 MB/s
test sha1_collision_10000 ... bench:      16,414 ns/iter (+/- 509) = 609 MB/s
Byron commented 4 months ago

Here is my measurements, for informative purposes only, to demonstrate the pack resolution performance - an operation that happens for every clone or fetch.

Baseline

This is comparing the assembly version with the one without assembly, with the default sha1 0.10.6 crate from crates.io.

❯ hyperfine -w1 -N "gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx" "./target/release/gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx" -r2
Benchmark 1: gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
  Time (mean ± σ):      9.454 s ±  0.061 s    [User: 68.773 s, System: 2.390 s]
  Range (min … max):    9.411 s …  9.497 s    2 runs

Benchmark 2: ./target/release/gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
  Time (mean ± σ):     16.110 s ±  0.136 s    [User: 132.938 s, System: 2.582 s]
  Range (min … max):   16.013 s … 16.206 s    2 runs

Summary
  gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx ran
    1.70 ± 0.02 times faster than ./target/release/gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx

With default features, ASM didn't really kick in apparently.

Collision Detection

With the collision feature enabled.

❯ hyperfine -w1 -N "gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx" "./target/release/gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx" -r2
Benchmark 1: gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
  Time (mean ± σ):      9.487 s ±  0.017 s    [User: 68.135 s, System: 2.583 s]
  Range (min … max):    9.475 s …  9.499 s    2 runs

Benchmark 2: ./target/release/gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
  Time (mean ± σ):     26.471 s ±  0.263 s    [User: 225.025 s, System: 3.169 s]
  Range (min … max):   26.285 s … 26.656 s    2 runs

Summary
  gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx ran
    2.79 ± 0.03 times faster than ./target/release/gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx

Comparison by hand

gitoxide ( sha1-collisions) +1 -1 [$!?] took 26s
❯ ./target/release/gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
 20:54:39 Hash of index 'pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx' done 209.7MB in 0.44s (477.7MB/s)
 20:54:39                                           collecting sorted index done 7.5M entries in 0.99s (7.5M entries/s)
 20:54:41 Hash of pack 'pack-b73c0dd365b8ab881a163a1d342dac7416f52591.pack' done 1.3GB in 3.13s (423.2MB/s)
 20:54:41                                                          indexing done 7.5M objects in 2.15s (3.5M objects/s)
 20:55:05                                                         Resolving done 7.5M objects in 23.75s (315.4k objects/s)
 20:55:05                                                          Decoding done 94.3GB in 23.75s (4.0GB/s)

gitoxide ( sha1-collisions) +1 -1 [$!?] took 27s
❯ gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
 20:55:22 Hash of index 'pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx' done 209.7MB in 0.14s (1.5GB/s)
 20:55:23 Hash of pack 'pack-b73c0dd365b8ab881a163a1d342dac7416f52591.pack' done 1.3GB in 0.78s (1.7GB/s)
 20:55:23                                           collecting sorted index done 7.5M entries in 0.83s (9.1M entries/s)
 20:55:24                                                          indexing done 7.5M objects in 1.36s (5.5M objects/s)
 20:55:32                                                         Resolving done 7.5M objects in 7.37s (1.0M objects/s)
 20:55:32                                                          Decoding done 94.3GB in 7.37s (12.8GB/s)

We are seeing a little less than 1/3rd of the ASM performance.

Summary

Here I ran out of time as for some reason, when patching in the modified version of the crate alone (which required a clone and a version modification to match the one available locally), it triggered a lot of compile errors across the codebase that had nothing to do with it. Usually along the line of .as_ref() can't be inferred, needs type annotation. I have no explanation for this.

However, if the benchmarks above are an indication, we might be talking 609/950 (roughly 64%) reduction of this particular operation.

I truly wonder how Git can be this fast, as its hash performance doesn't seem to need ASM and seemingly supports collision detection as well.

Usability in gitoxide

It can probably be used with a runtime switch, but I didn't yet test if the increased footprint (due to the check-state) has a negative effect on performance even when it's disabled. It's probably hard to say much about this if ASM doesn't kick in automatically anymore in version 0.11. Technically it will be usable for sure.

dignifiedquire commented 4 months ago

and gitoxide is in the same ballpark when the ASM version of the implementation is used

Do you have a link to that ASM code, would love to check out how they do it, after simplifying the code a bit, I think there is probably more that can be done, but adding an optimised version certainly can come later.

dignifiedquire commented 4 months ago

I truly wonder how Git can be this fast, as its hash performance doesn't seem to need ASM and seemingly supports collision detection as well.

This is quite surprising indeed, as the implementation here is based on the exact same code

dignifiedquire commented 4 months ago

When running on my linux machine with an AMD that has Sha intrinsics

test sha1_10              ... bench:           7 ns/iter (+/- 0) = 1428 MB/s
test sha1_100             ... bench:          58 ns/iter (+/- 0) = 1724 MB/s
test sha1_1000            ... bench:         516 ns/iter (+/- 1) = 1937 MB/s
test sha1_10000           ... bench:       5,091 ns/iter (+/- 24) = 1964 MB/s
test sha1_collision_10    ... bench:          27 ns/iter (+/- 0) = 370 MB/s
test sha1_collision_100   ... bench:         250 ns/iter (+/- 2) = 400 MB/s
test sha1_collision_1000  ... bench:       2,322 ns/iter (+/- 12) = 430 MB/s
test sha1_collision_10000 ... bench:      22,969 ns/iter (+/- 138) = 435 MB/s
Byron commented 4 months ago

and gitoxide is in the same ballpark when the ASM version of the implementation is used

Do you have a link to that ASM code, would love to check out how they do it, after simplifying the code a bit, I think there is probably more that can be done, but adding an optimised version certainly can come later.

These can be activated by adding the asm feature to the sha1 crate, at least in version 0.10. I had a feeling these don't exist like this anymore in 0.11. And I definitely agree, optimization can come later. At this performance loss, gitoxide will probably work hard to get the SHA256 transition going 😅.

dignifiedquire commented 4 months ago

These can be activated by adding the asm feature to the sha1 crate, at least in version 0.10.

Ah yeah, I know more about these than I want to actually 🤣 , on 0.11 it's all auto detected by default. I thought there might be another hash implementation somewhere 😅

The regular ASM implementation could be ported to do detection I believe, as it does rounds manually (just checked). But the overall work is still around 1.5x at least, soo... The current code could be probably tweaked a bit more to make the rust compiler happier which should help some.

Byron commented 4 months ago

Ah yeah, I know more about these than I want to actually 🤣 , on 0.11 it's all auto detected by default. I thought there might be another hash implementation somewhere 😅

There can only be one ;)! But in all seriousness, it's great to hear as it would remove one reason for incompatibility that was plaguing gitoxide and required a workaround. I hope it will still be as fast, and will definitely be checking it :)..

From my tests, I have filled them into the original comment, performance (with collision-checks) looks like close to 1/3rd of what it is with ASM, while the latest version, 0.11 with ASM feature detection, seems to not kick in ASM on my machine (M1) or with the --release build settings I was using. So either way, performance was greatly reduced.

But there is no way around biting that bullet, even though I will definitely put in an escape hatch for those who want to live like it's 2016😅.

In any case, and in case it's not clear over my ramblings about performance: Absolutely stunning work, I am loving it! (it's just terrible to realize that a lot of the advertised performance came from not comparing apples to apples)


In case it matters, I also had to make these unrelated changes because the Rust compiler didn't like it anymore when using the code in this PR. I have no explanation.

commit 798b11a57977e353ecb53cd7b85f0525e74c15d0
Author: Sebastian Thiel <sebastian.thiel@icloud.com>
Date:   Fri Feb 23 19:00:41 2024 +0100

    TBD

diff --git a/gix-object/src/commit/message/body.rs b/gix-object/src/commit/message/body.rs
index 0da6d1d14..6e84f3e2e 100644
--- a/gix-object/src/commit/message/body.rs
+++ b/gix-object/src/commit/message/body.rs
@@ -33,7 +33,7 @@ pub struct TrailerRef<'a> {

 fn parse_single_line_trailer<'a, E: ParserError<&'a [u8]>>(i: &mut &'a [u8]) -> PResult<(&'a BStr, &'a BStr), E> {
     *i = i.trim_end();
-    let (token, value) = separated_pair(take_until(1.., b":".as_ref()), b": ", rest).parse_next(i)?;
+    let (token, value) = separated_pair(take_until(1.., &b":"[..]), b": ", rest).parse_next(i)?;

     if token.trim_end().len() != token.len() || value.trim_start().len() != value.len() {
         Err(winnow::error::ErrMode::from_error_kind(i, ErrorKind::Fail).cut())
diff --git a/gix-pack/src/data/input/bytes_to_entries.rs b/gix-pack/src/data/input/bytes_to_entries.rs
index 7450e9134..e089250eb 100644
--- a/gix-pack/src/data/input/bytes_to_entries.rs
+++ b/gix-pack/src/data/input/bytes_to_entries.rs
@@ -138,7 +138,7 @@ where

         let crc32 = if self.compressed.crc32() {
             let mut header_buf = [0u8; 12 + gix_hash::Kind::longest().len_in_bytes()];
-            let header_len = entry.header.write_to(bytes_copied, &mut header_buf.as_mut())?;
+            let header_len = entry.header.write_to(bytes_copied, &mut header_buf.as_mut_slice())?;
             let state = gix_features::hash::crc32_update(0, &header_buf[..header_len]);
             Some(gix_features::hash::crc32_update(state, &compressed))
         } else {
diff --git a/gix-pack/src/data/input/entry.rs b/gix-pack/src/data/input/entry.rs
index 7d3d9b3cb..cb5c2af80 100644
--- a/gix-pack/src/data/input/entry.rs
+++ b/gix-pack/src/data/input/entry.rs
@@ -33,7 +33,7 @@ impl input::Entry {
         let mut header_buf = [0u8; 12 + gix_hash::Kind::longest().len_in_bytes()];
         let header_len = self
             .header
-            .write_to(self.decompressed_size, &mut header_buf.as_mut())
+            .write_to(self.decompressed_size, &mut header_buf.as_mut_slice())
             .expect("write to memory will not fail");
         let state = gix_features::hash::crc32_update(0, &header_buf[..header_len]);
         gix_features::hash::crc32_update(state, self.compressed.as_ref().expect("we always set it"))
diff --git a/gix-worktree/src/stack/state/mod.rs b/gix-worktree/src/stack/state/mod.rs
index 30e3c609f..07bffbfbc 100644
--- a/gix-worktree/src/stack/state/mod.rs
+++ b/gix-worktree/src/stack/state/mod.rs
@@ -96,7 +96,7 @@ impl State {
         let a1_backing;
         #[cfg(feature = "attributes")]
         let a2_backing;
-        let names = match self {
+        let names: &[(&bstr::BStr, Option<_>)] = match self {
             State::IgnoreStack(ignore) => {
                 a1_backing = [(
                     ignore.exclude_file_name_for_directories.as_bytes().as_bstr(),
dignifiedquire commented 4 months ago

Two questions, (1) do you happen to have performance numbers for the version git is using? (2) do you know which configuration for the collision detection they are running?

Byron commented 4 months ago

Regarding 1): No, and it's not truly comparable as Git has a lot of extra overhead and it doesn't scale with cores at all. Maybe in single-core mode it can be somewhat comparable though, especially with a profiler to see where it spends time.

git verify-pack $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx takes 51s (but only uses 3 cores), maybe it does more as well, I don't know.

If that is normalized to a single core with git -c pack.threads=1 verify-pack $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx it takes 1m25s, pretty good still, whereas gix (with collisions enabled) takes 3m13s (./target/release/gix -t1 free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx). Interestingly, the current highest performing build can do it (with a single core) in 1m1s.

So I think the 1.5x (better 1.4x) number for the ASM version seems to be what Git has, more or less, assuming all else equal. It's a respectable result that Git presents on a single core.

Regarding 2) All I know is the source in sha1.c which I linked earlier and I think you have seen already. There can always be compile-time tricks though that I don't understand, and the local version I am using is git version 2.39.3 (Apple Git-145), whatever magic sauce that comes with.

Just to be sure it wasn't laced with too much fairy dust, I built e02ecfcc534e2021aae29077a958dd11c3897e4c of git myself and ran /Users/byron/dev/github.com/git/git/git -c pack.threads=1 verify-pack $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx in 1m 24s, so it's achieving the same result. Maybe it's just that C code compiling to something very, very fast?

dignifiedquire commented 4 months ago

have you tried building with rustflags="target-cpu=native" this makes quite a big difference in these scenarios

Byron commented 4 months ago

That's a valid point! It doesn't make a difference for this branch, it's virtually unchanged. After updating to the most recent version in this branch I did notice a 5% speedup though. When retrying the latest master in this repository, its 820MB/s SHA1 performance still couldn't get close to the 1.7GB I get with the current implementation.

I think what I will end up with is to use v0.10 for the performance-toggle, and the checked version here for everything else.

Also, I couldn't resist to make measurements on Git's hashing alone, using a 200MB file as input.

❯ hyperfine -N -w1 'git hash-object -t blob tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx'
Benchmark 1: git hash-object -t blob tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
  Time (mean ± σ):     113.6 ms ±   2.3 ms    [User: 92.8 ms, System: 14.5 ms]
  Range (min … max):   112.4 ms … 124.2 ms    26 runs

If that is extrapolated, we get to a whopping 1.5GB/s - 1.7GB/s.

With a bigger file of 1326MB, I get:

❯ hyperfine -N -w1  'git hash-object --literally tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.pack'
Benchmark 1: git hash-object --literally tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.pack
  Time (mean ± σ):     806.9 ms ±   8.6 ms    [User: 554.0 ms, System: 180.9 ms]
  Range (min … max):   795.6 ms … 820.0 ms    10 runs

Which puts it at around 1.64GB/s .

And because I couldn't believe that a bunch of C in a file with a casual build can be this fast, I profiled the run with the 1.3GB file, and here is the answer to all of our questions.

Screenshot 2024-02-24 at 09 42 28

The entrypoint in Git seems to be here, which clearly talks about platform specific implementations. All the configuration for these seems to come from here.

Maybe they deal with shattered, maybe they don't? I don't know how to validate that as unfortunately, git hash-object will add the Git object header to any input it's provided. But since the software fallback for Sha1 contains the mitigation, I presume they wouldn't use a platform-specific acceleration if it wouldn't have the same qualities.

That was an interesting excursion, I learned something. Maybe in the end, in order to be competitive, I'd need a gix-sha1 crate that uses sha1 with mitigration as fallback, like Git, but links to the platform variants when possible.

dignifiedquire commented 4 months ago

Okay, that hash is not using collision detection, that is using the regular accelerated sha1 routines, which is why it can be so fast. I suspect what git is doing is that it only uses the hash collision version on very specific points, and not internally, to avoid the slow down.

dignifiedquire commented 4 months ago

The regression comes from here: https://github.com/RustCrypto/hashes/pull/542 the asm versions are actually fully removed :(

I guess it really is time to port the assembly to asm!..

Byron commented 4 months ago

Okay, that hash is not using collision detection, that is using the regular accelerated sha1 routines, which is why it can be so fast. I suspect what git is doing is that it only uses the hash collision version on very specific points, and not internally, to avoid the slow down.

I looked for 15 minutes and could only conclude that this seems to be a compile-time-only affair. Git doesn't always get compiled with its own implementation, but if enabled it would detect collisions and die when one was found. But I might well be wrong with that. It feels a bit like the mitigation was quickly made available, but over time was supplanted by faster implementations again.

Git also doesn't have a choice for this beyond compile time it seems (or I can't find it), which makes me think that if, for some reason, the runtime switching doesn't work or reduces performance, I could also go back to a compile time switch and probably would reduce complexity that way. And I am saying this in case it would simplify things here as well to go compile-time only.

dkg commented 4 months ago

This is really great work. Thank you both for the thoughtful back-and-forth, for documenting your sleuthing here, and for valuing correctness over speed.

dignifiedquire commented 4 months ago

Maybe it's better to move this implementation to a separate crate

true, should I just add it as a new crate in here?

newpavlov commented 4 months ago

should I just add it as a new crate in here?

Yes. Could you also add a recommendation to the sha1 crate-level docs to use sha1-checked if possible?

dignifiedquire commented 4 months ago

I think someone has to approve the new workflows, not sure, might also only get activated when merged

newpavlov commented 4 months ago

It's just takes a significant time for them to run with Cargo.lock changes. It triggers jobs for all crates, so we have to wait. You also could squash commits into one and force push to cancel some pending jobs.

dignifiedquire commented 4 months ago

You also could squash commits into one and force push to cancel some pending jobs.

squashed a bunch, hopefully this helps cleanup the queue

dignifiedquire commented 4 months ago

how far out is a release of digest@0.11? I am wondering if it makes sense to backport the current implementation, and publish a 0.10 version, to allow for quick integration, as some projects are already waiting on this

cr-marcstevens commented 4 months ago

Before release and deployment, may I ask what kind of testing has been done for whether it is a faithful translation? For one, I would suggest comparing ubccheck output against the original version for random inputs: this is a carefully crafted filter that determines which full checks can be dismissed a priori, hence an error in translation here could dismiss attacks erroniously before the full check. And testing sha1 on random inputs and known collisions.

dignifiedquire commented 4 months ago

In terms of testing the following has been done

The ubc_check code specifically has not been hand ported, it was translated to rust using c2rust and then only cleaned up carefully, with no logic or constant changes

dignifiedquire commented 4 months ago

@tarcieri @newpavlov anything I can help with to move this forward?

newpavlov commented 4 months ago

I will try to take a deeper look at it during this weekend. I will merge it right away, if there are no issues.

dignifiedquire commented 4 months ago

Thanks, appreciated

newpavlov commented 4 months ago

Re: digest v0.11. We do not have an exact date, so it indeed may be worth to release sha1-checked based on digest v0.10 first. Luckily, required changes should be relatively small.

dignifiedquire commented 4 months ago

Sounds good, I'll change this PR to be based on digest@0.10 then for now, and then once that is published we can update it back to 0.11

wiktor-k commented 3 months ago

I have only minor nits. After those are fixed and digest is downgraded to v0.10, I think we can merge it.

Hi! Is there anything left to get this merged? Sadly some of us are stuck with SHA-1 and this PR would make it a little bit less... explosive. Thanks for your time!

newpavlov commented 3 months ago

Sorry, I forgot to return to this PR. I will merge it and release the crate shortly.