Lokathor / tinyvec

Just, really the littlest Vec you could need. So smol.
https://docs.rs/tinyvec
Apache License 2.0
648 stars 49 forks source link

Should bound-checking when converting to slices do something different than panic? #194

Open Fuuzetsu opened 4 months ago

Fuuzetsu commented 4 months ago

There's a lot of code similar to this, on ArrayVec:

fn deref(&self) -> &Self::Target {
    &self.data.as_slice()[..self.len as usize]
  }

We invoke something like this every single time we try to get a slice, directly or indirectly (such as via .iter()).

In world of unsafe implementations, one would normally maintain the invariant that len <= self.data.len() and unsafely create the slice. This is against what tinyvec tries to achieve. Sadly, the trade-off is that this will basically always insert panic into the generated code.

In practice, this panic can only happen if there's a bug in ArrayVec (forgetting to update len properly, missed assumption) or user does something sketchy like unsafely modify the field.

There is a different option however: change the function to look more like

fn deref(&self) -> &Self::Target {
  let len = self.len as usize;
  let s = &self.data.as_slice();
  if len < s.len() {
    &s[..len]
  } else {
    s
  }
}

This (presumably) compiles to code that doesn't have a panic and it behaves the same as usual in non-weird code. The only change in behaviour is that instead of crashing on ArrayVec bug or due to some unsafe user code (when self.len grows past capacity), it'll now give the full capacity. For all currently-working code it stays working, but probably faster (not slower, for sure...). Any currently-broken code is presumably already crashing, if such code even exists.

We can recover slight bit of previous behaviour with debug asserts I suppose...

Shnatsel commented 4 months ago

A panic is usually preferable to a plain branch because all branches leading to a panic are marked as unlikely, and the panic handler is outlined so it's not on the hot path at all, not taking up space in icache etc.

I think this could be considered if benchmarks show that it is beneficial, but I very much doubt it will be possible to do better than a panic here for the aforementioned reasons. assert! can create non-outlined code, but indexing panics should be fully outlined and have the equivalent of #[cold] on them.

Fuuzetsu commented 4 months ago

I think this could be considered if benchmarks show that it is beneficial, but I very much doubt it will be possible to do better than a panic here for the aforementioned reasons.

Can't we just do this then?

fn deref(&self) -> &Self::Target {
  let len = self.len as usize;
  let s = self.data.as_slice();

  #[inline]
  #[cold]
  fn cold() {}

  if len > s.len() {
    cold();
    s
  } else {
    &s[..len]
  }
}

I guess I can make a change and check on some code I have that it makes any difference at all. But I expect that it does, as per the other ticket I made.

Fuuzetsu commented 4 months ago

The code under benchmark is:

pub fn get(&self, dir: Dir, p: Price) -> Qty {
        self.pqs[dir]
            .iter()
            .find_map(|PqPair { price, qty }| if p == *price { Some(*qty) } else { None })
            .unwrap_or(0)
    }

For details on this, see https://github.com/Lokathor/tinyvec/issues/193#issuecomment-2136493274 ; but if preferred/necessary, I can cook something similar that could maybe go into tinyvec repository.

I attach llvm-mca outputs before and after the change. It's clear that in a loop, it performs much better.

after.txt before.txt

The only change between the two is this, in tinyvec

diff --git a/src/arrayvec.rs b/src/arrayvec.rs
index 52aeeb1..ab370e3 100644
--- a/src/arrayvec.rs
+++ b/src/arrayvec.rs
@@ -49,6 +49,11 @@ macro_rules! array_vec {
   };
 }

+#[inline]
+#[cold]
+/// Used to mark some braches as unlikely.
+fn cold() {}
+
 /// An array-backed, vector-like data structure.
 ///
 /// * `ArrayVec` has a fixed capacity, equal to the minimum of the array size
@@ -157,7 +162,14 @@ impl<A: Array> Deref for ArrayVec<A> {
   #[inline(always)]
   #[must_use]
   fn deref(&self) -> &Self::Target {
-    &self.data.as_slice()[..self.len as usize]
+    let len = self.len as usize;
+    let s = self.data.as_slice();
+    if len > s.len() {
+      cold();
+      s
+    } else {
+      &s[..len]
+    }
   }
 }

@@ -165,7 +177,14 @@ impl<A: Array> DerefMut for ArrayVec<A> {
   #[inline(always)]
   #[must_use]
   fn deref_mut(&mut self) -> &mut Self::Target {
-    &mut self.data.as_slice_mut()[..self.len as usize]
+    let len = self.len as usize;
+    let s = self.data.as_slice_mut();
+    if len > s.len() {
+      cold();
+      s
+    } else {
+      &mut s[..len]
+    }
   }
 }

For completeness, I attach benchmark results for same suite as I had in #193.

bench.txt

I'm not sure why it's mostly better except for the few ones. I guess I should probably put something together that others can replicate first.

Either way, it certainly doesn't look like nothing is happening, right?