While working on #3, I observed that vectorization didn't take place the ASCII only case due to some reason. My hunch is that the lookup tables being not static are affecting performance for it. I attempted using static finals for the lookup table and the performance improved markedly. ~10-12x performance boost for the ASCII path.
While working on #3, I observed that vectorization didn't take place the ASCII only case due to some reason. My hunch is that the lookup tables being not static are affecting performance for it. I attempted using static finals for the lookup table and the performance improved markedly. ~10-12x performance boost for the ASCII path.
Results: https://gist.github.com/amCap1712/948cd2c94d45a2ce183e2f2c7a7f2a0a