Simplify `Repr::len() `

Kijewski commented 1 year ago

This PR depends on #276.

Using the fact that LastUtf8Char::Heap comes exactly after the last inline length value, we can simplify the calculation a bit.

The resulting code on amd64:

mov    0x8(%rdi),%rax
movzbl 0x17(%rdi),%ecx
add    $0xffffffffffffff40,%rcx
cmp    $0x18,%rcx
cmovae %rax,%rcx
mov    $0x18,%eax
cmovbe %rcx,%rax
ret

ParkMyCar commented 1 year ago

This PR is great, thank you @Kijewski! Do you mind rebasing on main since we merged #276?

ParkMyCar commented 1 year ago

Also this makes me wonder if storing the length as the last item of the heap variant would improve performance, i.e. [ ptr | cap | len ]. Then when reading the length the only load we'd need to do is the last word.

Separately I've been thinking about dropping support for 24 bytes and instead using just 23. Naively I think that would allow for a more performant implementation, at the cost of losing 1 byte, which might be worth it.

Kijewski commented 1 year ago

Also this makes me wonder if storing the length as the last item of the heap variant would improve performance, i.e. [ ptr | cap | len ].

The problem is that the last byte cannot be used to store a value there, and we need to access the length more often than the capacity (I guess). If I'm not mistaken, the processor always reads two adjacent words at once since back in Pentium days, so I guess swapping the entries wouldn't be faster anyway. But I could easily be mistaken.

Separately I've been thinking about dropping support for 24 bytes and instead using just 23.

I think 23 bytes is plenty for most applications. Or at least 24 is not much better than 23 bytes. I assume reading would be same as fast, but writing to an inlined string could be quite a bit faster.

ParkMyCar commented 1 year ago

Looks like we'll need to bump the MSRV to 1.59 support std::arch::asm!. Given that version is > 1 year old this seems okay to me. I'll handle this in a follow up PR

ParkMyCar / compact_str

Simplify `Repr::len() ` #283