Open mayeut opened 8 years ago
ssse3 with smaller lut https://gist.github.com/aqrit/6e73ca6ff52f72a2b121d584745f89f3
Is there any interest in adopting this PR?
We are looking into adopting this library, and ran into an issue with some keys we expect to decode have \n
characters.
@izoroth Short answer: not currently, no.
Long answer: Well, it's not really a PR, it's a series of sketches of what a despacing algorithm might look like. There's no definite patchset to merge, and if there was, the patches would probably not apply on top of the current master branch. There would also be API changes to consider. And there is a discussion to be had about whether to optimize for sparse whitespace or dense whitespace.
I think that whitespace filtering could make sense in this library as an opt-in feature, but it's not fleshed out enough yet.
This is a new issue to discuss whitespace character filtering as mentioned in #15 and #27
The implementation will probably depend on density of whitespace characters.
many occurrences: a preprocess step might be more suited. This is still quite slow (1st implementation was based on code snippet from #15, changed that with a LUT for the shuffle mask): SSE4.2 implementation
Here are the timings. All throughputs are given for the output buffer which is the same for all variants.
decode
: valid base64 inputdecode-s
: one whitespace every 80 characters (we can see the choice of 80 has an impact on AVX2 decoder for method 1 because it's a multiple of 16 but not 32)decode-s8
: 8 whitespace every 80 charactersdecode-d
: 1 whitespace before each valid character.Method 1 seems to be the best for sparse whitespace characters except for
AVX2
decoder (that could/should be fixed to handle 8/16 bytes valid input). For handling sparse group of characters or very dense whitespace characters, Method 3 is better. Real world data analysis would be good to know what case we want to optimize.For Method 1:
For Method 3: