Open richardstartin opened 3 years ago
Stumbled on this via the mentioned kafka ticket; adding a note:
I benchmarked both with a lookup table and a bitshift-based DIV against the prior implementation found here and the bitshift and lut approach were both very close and both much faster than the divide-based implementation. (The divide-based impl is currently used in this repo).
Hi, I read your blog post. It was interesting so thanks for writing it. The function called "startin" in your blog is a bit different to what's in my post because my implementation precomputes the lengths and puts them in a lookup table addressable by the number of leading zeros, so doesn't do the division during the encoding:
I think you will find different results if you consider
long
inputs too, where, as you note, manual unrolling gets ugly quickly, and you will start to fall foul of inlining heuristics as code size increases. I found the approach outlined in my blog post (with the lookup table) starts beating the most obvious implementation at 3 nonzero bytes in the input, and at 60 bytes it is very friendly to C2's inlining policies.Obviously, varint encoding makes less sense as the inputs get larger, so optimising for longer inputs is questionable, but there are a lot of cases where the encoder doesn't have a choice and must handle unpredictable inputs.