Checksum performance is slow on Arm64

The checksum performance in folly is not optimized on Arm64 with Neon, which induce that the performance is quite slow.

./folly/hash/detail/ChecksumDetail.h

Cachelib heavily rely on Folly to realize the checksum.

From the perf top, in the cachelib with hyprid cache configuration, the checksum is consuming a lot of CPU time, which has been a bottleneck.

Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
$ checksum_benchmark --bm_min_usec=10000
============================================================================
folly/hash/test/ChecksumBenchmark.cpp           relative  time/iter  iters/s
============================================================================
crc32_512                                                   55.73ns   17.94M
crc32_1024                                                  85.15ns   11.74M
crc32_2048                                                 116.29ns    8.60M
crc32_4096                                                 191.03ns    5.23M
crc32_8192                                                 341.44ns    2.93M
crc32_16384                                                627.76ns    1.59M
crc32_32768                                                  1.21us  827.16K
============================================================================
Comparison:

============================================================================
[...]folly/hash/test/ChecksumBenchmark.cpp     relative  time/iter   iters/s
============================================================================
crc32_512                                                   1.80us   554.82K
crc32_1024                                                  3.58us   279.35K
crc32_2048                                                  7.14us   140.13K
crc32_4096                                                 14.25us    70.18K
crc32_8192                                                 28.47us    35.12K
crc32_16384                                                56.93us    17.57K
crc32_32768                                               113.83us     8.79K

facebook / folly

Checksum performance is slow on Arm64 #2027