Open overlookmotel opened 5 months ago
It's almost correct, except when API CompactString::from_string_buffer
is called, it can be used to construct an empty string allocated on heap.
IMO you can fix this by changing from_string_buffer
to special case empty string, but I'm not sure if @ParkMyCar or @Kijewski agrees on this since this does make it less useful for its intended use case.
It would fail for CompactString::with_capacity()
, too.
It would fail for
CompactString::with_capacity()
, too.
In that case, maybe we should check for capacity and length, instead of modifying from_string_buffer
?
Oh dear. Yes, I completely failed to consider that capacity and length are independent, so zero-length strings can be on the heap.
In that case, I don't think there's anything to be done here.
If static string didn't have it's own discriminant, then this would work:
pub fn is_empty(&self) -> bool {
let heap_len = self.1;
let last_byte = self.last_byte();
// Empty inline string also contains 0 in the bytes holding heap length
heap_len == 0 && (last_byte == 192 || last_byte == 224)
}
That produces branch-free code which might be a bit faster. But adding || last_byte == 225
does result in a branch, so it's no good.
Unless anyone else sees any mileage in this, I'll close the issue.
Yeah, I guess this particular method is as good as it gets for the current data representation.
But nevertheless, thank you for looking into possible optimizations! It is very easy to convince yourself that whatever you implemented is the optimum, so it's always welcome to get a new perspective to challenge your view and maybe teach you something new!
If static string didn't have it's own discriminant, then this would work:
I think you code might still work, since static string is guaranteed to be non-empty now (inlined if empty) and its layout is stable:
I think you code might still work, since static string is guaranteed to be non-empty now (inlined if empty) and its layout is stable
Unfortunately I was out of date - the value of HEAP_MASK
is now 216 not 224. Changing my 2nd version to last_byte == 192 || last_byte == 216
produces a branch, because testing for "192 or 216" can't be done with an and
operation the way "192 or 224" can.
I think I'll give up at this point! But thanks both for entertaining my attempts.
Oh wait! Maybe I've got something:
const LENGTH_MASK: u8 = 192;
const HEAP_MASK: u8 = 216;
const HEAP_MASK_AFTER_SUB: usize = (HEAP_MASK as usize)
.wrapping_sub(LENGTH_MASK as usize);
pub fn is_empty_new(&self) -> bool {
let len_heap = ensure_read(self.1);
let last_byte = self.last_byte() as usize;
let mut len = last_byte.wrapping_sub(LENGTH_MASK as usize);
// NB `==` not `>=`
if len == HEAP_MASK_AFTER_SUB {
len = len_heap;
}
len == 0
}
If I'm not mistaken (but very possible I am), this uses 1 less register, as last_byte
is only used once.
https://godbolt.org/z/x3Whd115j
This relies on static strings never being 0 length (if that is the case).
I thought this might also have application for len()
and as_slice()
, but that didn't take into account 24-byte inline strings.
Or is this better?
const LENGTH_MASK: u8 = 192;
const HEAP_MASK: u8 = 216;
const HEAP_MASK_AFTER_SUB: usize = (HEAP_MASK as usize)
.wrapping_sub(LENGTH_MASK as usize);
pub fn is_empty_new2(&self) -> bool {
let len_heap = self.1;
let last_byte = self.last_byte() as usize;
let len = last_byte.wrapping_sub(LENGTH_MASK as usize);
let is_empty_inline = len == 0;
// NB `==` not `>=`
let is_empty_heap = len == HEAP_MASK_AFTER_SUB && len_heap == 0;
is_empty_inline || is_empty_heap
}
Same instruction count, but ensure_read(self.1)
is gone, and there's no branches.
Without measuring it's hard to tell which version is the best.
If I remember correctly, instructions like sete
(set register to 0/1 depending on the zero flag), adc
(add with carry), etc. can slow things down a lot, because the instruction has to wait for an updated register file, so the instruction pipeline gets stalled. But my knowledge is > 10 years old. It might even only reflect 32 bit processors, and processors got a lot smarter in the last decade.
A call to ensure_read()
should be omitted whenever possible. The method is a blackbox, and only used to force the compiler to read a variable into a register, before continuing. This way the compiler "remembers" that is has already cached the value, and it won't generate a branch to load the data, but use a cmovcc
instruction instead.
is_empty
checks both inline length and heap length:https://github.com/ParkMyCar/compact_str/blob/302c595d41373943dc432be2381b8f6a74f0ddf6/compact_str/src/repr/mod.rs#L445-L453
I believe an empty string is always stored inline. Therefore it could be reduced to:
If my assumption is correct, let me know and I'll make a PR.