ParkMyCar / compact_str

A memory efficient string type that can store up to 24* bytes on the stack
MIT License
557 stars 41 forks source link

Improve efficiency of `is_empty` #354

Open overlookmotel opened 5 months ago

overlookmotel commented 5 months ago

is_empty checks both inline length and heap length:

https://github.com/ParkMyCar/compact_str/blob/302c595d41373943dc432be2381b8f6a74f0ddf6/compact_str/src/repr/mod.rs#L445-L453

I believe an empty string is always stored inline. Therefore it could be reduced to:

pub fn is_empty(&self) -> bool {
  self.last_byte() == LastUtf8Char::L0 as u8
}

If my assumption is correct, let me know and I'll make a PR.

NobodyXu commented 5 months ago

It's almost correct, except when API CompactString::from_string_buffer is called, it can be used to construct an empty string allocated on heap.

IMO you can fix this by changing from_string_buffer to special case empty string, but I'm not sure if @ParkMyCar or @Kijewski agrees on this since this does make it less useful for its intended use case.

Kijewski commented 5 months ago

It would fail for CompactString::with_capacity(), too.

NobodyXu commented 5 months ago

It would fail for CompactString::with_capacity(), too.

In that case, maybe we should check for capacity and length, instead of modifying from_string_buffer?

overlookmotel commented 5 months ago

Oh dear. Yes, I completely failed to consider that capacity and length are independent, so zero-length strings can be on the heap.

In that case, I don't think there's anything to be done here.

If static string didn't have it's own discriminant, then this would work:

pub fn is_empty(&self) -> bool {
    let heap_len = self.1;
    let last_byte = self.last_byte(); 
    // Empty inline string also contains 0 in the bytes holding heap length
    heap_len == 0 && (last_byte == 192 || last_byte == 224)
}

That produces branch-free code which might be a bit faster. But adding || last_byte == 225 does result in a branch, so it's no good.

Unless anyone else sees any mileage in this, I'll close the issue.

Kijewski commented 5 months ago

Yeah, I guess this particular method is as good as it gets for the current data representation.

But nevertheless, thank you for looking into possible optimizations! It is very easy to convince yourself that whatever you implemented is the optimum, so it's always welcome to get a new perspective to challenge your view and maybe teach you something new!

NobodyXu commented 5 months ago

If static string didn't have it's own discriminant, then this would work:

I think you code might still work, since static string is guaranteed to be non-empty now (inlined if empty) and its layout is stable:

https://github.com/ParkMyCar/compact_str/blob/302c595d41373943dc432be2381b8f6a74f0ddf6/compact_str/src/repr/static_str.rs#L20

overlookmotel commented 5 months ago

I think you code might still work, since static string is guaranteed to be non-empty now (inlined if empty) and its layout is stable

Unfortunately I was out of date - the value of HEAP_MASK is now 216 not 224. Changing my 2nd version to last_byte == 192 || last_byte == 216 produces a branch, because testing for "192 or 216" can't be done with an and operation the way "192 or 224" can.

I think I'll give up at this point! But thanks both for entertaining my attempts.

overlookmotel commented 5 months ago

Oh wait! Maybe I've got something:

const LENGTH_MASK: u8 = 192;
const HEAP_MASK: u8 = 216;
const HEAP_MASK_AFTER_SUB: usize = (HEAP_MASK as usize)
  .wrapping_sub(LENGTH_MASK as usize);

pub fn is_empty_new(&self) -> bool {
  let len_heap = ensure_read(self.1);
  let last_byte = self.last_byte() as usize;
  let mut len = last_byte.wrapping_sub(LENGTH_MASK as usize);
  // NB `==` not `>=`
  if len == HEAP_MASK_AFTER_SUB {
    len = len_heap;
  }
  len == 0
}

If I'm not mistaken (but very possible I am), this uses 1 less register, as last_byte is only used once.

https://godbolt.org/z/x3Whd115j

This relies on static strings never being 0 length (if that is the case).

I thought this might also have application for len() and as_slice(), but that didn't take into account 24-byte inline strings.

overlookmotel commented 5 months ago

Or is this better?

const LENGTH_MASK: u8 = 192;
const HEAP_MASK: u8 = 216;
const HEAP_MASK_AFTER_SUB: usize = (HEAP_MASK as usize)
  .wrapping_sub(LENGTH_MASK as usize);

pub fn is_empty_new2(&self) -> bool {
  let len_heap = self.1;
  let last_byte = self.last_byte() as usize;
  let len = last_byte.wrapping_sub(LENGTH_MASK as usize);
  let is_empty_inline = len == 0;
  // NB `==` not `>=`
  let is_empty_heap = len == HEAP_MASK_AFTER_SUB && len_heap == 0;
  is_empty_inline || is_empty_heap
}

Same instruction count, but ensure_read(self.1) is gone, and there's no branches.

https://godbolt.org/z/T1xPT3WzG

Kijewski commented 5 months ago

Without measuring it's hard to tell which version is the best.

If I remember correctly, instructions like sete (set register to 0/1 depending on the zero flag), adc (add with carry), etc. can slow things down a lot, because the instruction has to wait for an updated register file, so the instruction pipeline gets stalled. But my knowledge is > 10 years old. It might even only reflect 32 bit processors, and processors got a lot smarter in the last decade.

A call to ensure_read() should be omitted whenever possible. The method is a blackbox, and only used to force the compiler to read a variable into a register, before continuing. This way the compiler "remembers" that is has already cached the value, and it won't generate a branch to load the data, but use a cmovcc instruction instead.