Closed Kijewski closed 1 year ago
This PR is great, thank you @Kijewski! Do you mind rebasing on main
since we merged #276?
Also this makes me wonder if storing the length as the last item of the heap variant would improve performance, i.e. [ ptr | cap | len ]
. Then when reading the length the only load we'd need to do is the last word.
Separately I've been thinking about dropping support for 24 bytes and instead using just 23. Naively I think that would allow for a more performant implementation, at the cost of losing 1 byte, which might be worth it.
Also this makes me wonder if storing the length as the last item of the heap variant would improve performance, i.e.
[ ptr | cap | len ]
.
The problem is that the last byte cannot be used to store a value there, and we need to access the length more often than the capacity (I guess). If I'm not mistaken, the processor always reads two adjacent words at once since back in Pentium days, so I guess swapping the entries wouldn't be faster anyway. But I could easily be mistaken.
Separately I've been thinking about dropping support for 24 bytes and instead using just 23.
I think 23 bytes is plenty for most applications. Or at least 24 is not much better than 23 bytes. I assume reading would be same as fast, but writing to an inlined string could be quite a bit faster.
Looks like we'll need to bump the MSRV to 1.59
support std::arch::asm!
. Given that version is > 1 year old this seems okay to me. I'll handle this in a follow up PR
This PR depends on #276.
Using the fact that
LastUtf8Char::Heap
comes exactly after the last inline length value, we can simplify the calculation a bit.The resulting code on amd64: