expr-fi / fastlwc

SIMD-enhanced word counter
MIT License
242 stars 10 forks source link

How does Shuffling help when counting words? #3

Closed Akaame closed 5 years ago

Akaame commented 5 years ago

https://github.com/expr-fi/fastlwc/blob/2cf0468bc8fc41969daa6a316a764624c490791c/simd.h#L113

I know this is no place to ask a programming question. But, I could not find any other means to contact you. I am going through your code, but I can not get a grasp of this and following lines. Could you please offer some explanation on what is going on with count_words function?

Thank you in advance.

expr-fi commented 5 years ago

Sure.

Despite what the name might suggest, the shuffle instruction is not like shuffling a deck of cards, i.e. a random shuffle. Rather, we are generating a new vector from the contents of the first argument (I'll refer to this as a), based on the contents of the second argument (b). It's not strictly a reorder, because elements from a might appear multiple times, or not at all.

Intel's pseudocode for the 128-bit shuffle instruction is as follows (link):

FOR j := 0 to 15
    i := j*8
    IF b[i+7] == 1
        dst[i+7:i] := 0
    ELSE
        index[3:0] := b[i+3:i]
        dst[i+7:i] := a[index*8+7:index*8]
    FI
ENDFOR

It might be explained as follows: to generate the new vector dst, we go through bytes in b one by one; if the byte has the MSB set, the corresponding byte in dst gets set to zero*; otherwise we read the lower half of the byte (values 0-15) as index, and corresponding byte in dst is set to the value of indexth byte of a.

In this case we have selected initial a so that when all that is said and done, the generated dst has only the white space bytes where they were in b, so we can compare it against b to find out which indices of b had white spaces, and proceed with the counting algorithm (basically a rising-edge counter).

* This is not relevant for us in this case

(edited out unnecessary technicality)

Akaame commented 5 years ago

Thank you for your kind response. AFAIK, The fact that there is no overlapping in the latest bits of ASCII representation of space characters allows us to utilize this kind of instruction which is very fortunate and this is a very clever exploit on your part.

Again, thank you for this amazing repo.

aqrit commented 5 years ago

Further Reading: http://0x80.pl/articles/simd-byte-lookup.html

expr-fi commented 5 years ago

this is a very clever exploit on your part.

It's a clever use all right, but not my invention ;-). @aqrit tipped me to it earlier.