WojciechMula / toys

Storage for my snippets, toy programs, etc.
BSD 2-Clause "Simplified" License
316 stars 38 forks source link

swar-utf8-length: slightly shorter version #17

Open falk-hueffner opened 2 years ago

falk-hueffner commented 2 years ago

Just a comment that I thought might be of interest to you: swar-utf8-length can be done with one instruction less (if we don't count creation of constants). This doesn't make a difference on x86 though since it has an "andnot" instruction and already the current version needs only three instructions.

                                                    // continuation     ASCII        ASCII    leading bytes
const uint64_t t0 = *qword;                         //  10xx.xxxx   00xx.xxxx    01xx.xxxx    11xx.xxxx
const uint64_t t1 = t0 & 0xc0c0c0c0c0c0c0c0llu;     //  1000.0000   0000.0000    0100.0000    1100.0000
const uint64_t t2 = t1 + 0x4040404040404040llu;     //  1100.0000   0100.0000    1000.0000   10000.0000
const uint64_t t3 = t1 & t2;                        //  1000.0000   0000.0000    0000.0000    0000.0000

There's also a perhaps interesting different approach here which does without popcount by generating 0/1 in the low bit of each byte and accumulating that.

WojciechMula commented 2 years ago

@falk-hueffner Hi, thanks for the comment. The instruction "andnot" is an addition that comes from the BMI extension, thus not available everywhere. I've just checked and gcc -march=skylake ... emits this instruction for the SWAR code.

Yeah, popcount can be replaced with multiplication. It was the approach proposed by @zwegner on twitter.