Open overlookmotel opened 10 months ago
It would probably affect as_str, as_bytes, reserve and as_mut_bytes, which is where the static str repr is special cased
Thanks for coming back so swiftly. Do you think this is worth further investigation or, does it sound like it's not going to be viable for that reason?
I think it worths giving it a shot
OK cool. I'll take a closer look when I get some time.
This is great! In your variant the ensure_read()
optimization barrier can also be removed, probably because the compiler understands your intentions better than it does with the current code:
pub fn len_new_new(c: &C) -> usize {
let last_byte = c.5 as usize;
let mut len = last_byte.wrapping_sub(LENGTH_MASK as usize);
let is_heap = len == HEAP_MASK_AFTER_SUB;
len = len.min(MAX_SIZE);
if is_heap {
len = c.1;
}
len
}
example::len_new:
movq 8(%rdi), %rcx
movzbl 23(%rdi), %edx
addq $-192, %rdx
cmpq $24, %rdx
movl $24, %eax
cmovbq %rdx, %rax
cmoveq %rcx, %rax
retq
example::len_new_new:
movzbl 23(%rdi), %ecx
addq $-192, %rcx
cmpq $24, %rcx
movl $24, %eax
cmovbq %rcx, %rax
cmoveq 8(%rdi), %rax
retq
Without the ensure_read()
blackbox the compiler might optimize better down the line, e.g. omitting repeated calls to this method, because it can tell that the code is pure.
Bingo! That's 2 less instructions, and 2 less registers.
I think I'd misunderstood the purpose of ensure_read()
. I had thought the point of it was to kick off loading len_heap
from memory as soon as possible, so that by the time you get to the cmoveq
it may be ready in a register and so no stall while waiting for it to load.
Is that not the point of it at all?
And, if I may, another basic question: Aside from reducing instructions, is minimising the number of registers a function uses a useful thing to do?
(NB I'm new to all this reading the runes of ASM business, so really feeling my way here. Any pointers much appreciated.)
And, if I may, another basic question: Aside from reducing instructions, is minimising the number of registers a function uses a useful thing to do?
It depends though, if it uses callee-saved register then I think it could be useful.
Looking at example::len_new_new
posted by @Kijewski it already looks quite optimal to me.
Also, if any call to the function is inlined, those saved registers can potentially have a positive impact on the caller
@overlookmotel this is great, thank you!
Static string could instead be represented as a heap string with capacity 0, which I assume isn't otherwise a possible state.
Yes! This is an idea I've had bouncing around for a bit, I think there are some optimizations we can make if we modify the discriminant a bit.
I'll whip up a PR this weekend, thanks again for the idea!
Thanks for taking it up. Glad you like it. I've learned a lot from the bit-fiddling tricks this library uses, so glad to contribute an idea (even if not the implementation!)
I believe it's possible to shave an instruction off
len()
(and probablyas_slice()
too) if static string had same discriminant as heap.Static string could instead be represented as a heap string with capacity 0, which I assume isn't otherwise a possible state. Then:
On 64-bit machines,
MAX_SIZE
andHEAP_MASK_AFTER_SUB
are the same (24). So this shaves off acmp
instruction aslen == HEAP_MASK_AFTER_SUB
and.min(MAX_SIZE)
both work off the result of a singlecmp
, rather than requiring 2. And (if I'm reading the ASM right) it also uses one less register.https://godbolt.org/z/e7Kc4PMTE
Quite possibly handling the special case of zero-capacity heap strings (which are actually static strings) would introduce costs elsewhere, but I'm not familiar enough with the code to say. Can anyone advise if that's likely to negate the gain?