emmansun / base64

Base64 with SIMD acceleration
https://godoc.org/github.com/emmansun/base64
BSD 3-Clause "New" or "Revised" License
5 stars 0 forks source link

arm64: decode minor optimization #21

Closed emmansun closed 2 weeks ago

emmansun commented 3 weeks ago

2 , Originally we used TBL/TBX and ORR instructions to implement table lookup, but I think ORR is unnecessary.

Take STD mode as sample, second LUT can be changed from

// The input consists of five valid character sets in the Base64 alphabet,
// which we need to map back to the 6-bit values they represent.
// There are three ranges, two singles, and then there's the rest.
//
//   #  From       To        LUT  Characters
//   1  [0..42]    [255]      #1  invalid input
//   2  [43]       [62]       #1  +
//   3  [44..46]   [255]      #1  invalid input
//   4  [47]       [63]       #1  /
//   5  [48..57]   [52..61]   #1  0..9
//   6  [58..63]   [255]      #1  invalid input
//   7  [64]       [255]      #2  invalid input
//   8  [65..90]   [0..25]    #2  A..Z
//   9  [91..96]   [255]      #2  invalid input
//  10  [97..122]  [26..51]   #2  a..z
//  11  [123..126] [255]      #2  invalid input
// (12) Everything else => invalid input

// The second LUT will use the VTBX instruction (out of range indices will be
// unchanged in destination). Input [64..126] will be mapped to index [1..63]
// in this LUT. Index 0 means that value comes from LUT #1.
static const uint8_t dec_lut2[] = {
      0U, 255U,   0U,   1U,   2U,   3U,   4U,   5U,   6U,   7U,   8U,   9U,  10U,  11U,  12U,  13U,
     14U,  15U,  16U,  17U,  18U,  19U,  20U,  21U,  22U,  23U,  24U,  25U, 255U, 255U, 255U, 255U,
    255U, 255U,  26U,  27U,  28U,  29U,  30U,  31U,  32U,  33U,  34U,  35U,  36U,  37U,  38U,  39U,
     40U,  41U,  42U,  43U,  44U,  45U,  46U,  47U,  48U,  49U,  50U,  51U, 255U, 255U, 255U, 255U,
};

to

// The second LUT will use the VTBX instruction (out of range indices will be
// unchanged in destination). Input [64..127] will be mapped to index [0..63]
// in this LUT. 
static const uint8_t dec_lut2[] = {
      255U,   0U,   1U,   2U,   3U,   4U,   5U,   6U,   7U,   8U,   9U,  10U,  11U,  12U,  13U,
     14U,  15U,  16U,  17U,  18U,  19U,  20U,  21U,  22U,  23U,  24U,  25U, 255U, 255U, 255U, 255U,
    255U, 255U,  26U,  27U,  28U,  29U,  30U,  31U,  32U,  33U,  34U,  35U,  36U,  37U,  38U,  39U,
     40U,  41U,  42U,  43U,  44U,  45U,  46U,  47U,  48U,  49U,  50U,  51U, 255U, 255U, 255U, 255U, 255U,
};

and UQSUB (vqsubq_u8) / CMHI (vcgtq_u8) are replaced with SUB / CMHS.

emmansun commented 2 weeks ago

v0.6.1