Closed ThomasWaldmann closed 1 year ago
Can you try forcing the ARM cpu_features.h file to be enabled for your M1 environment? I wonder if that would change anything.
Since I lack access to any MacOS M1 environment, it's not like I can test anything like this.
That wouldn't help, the code there is linux specific. But I guess best is to continue in libdeflate's issue tracker and point them to the zlib-ng code i found.
Ok. I can't say I'm surprised. For a long time Linux was the only serious ARM target of note.
I was curious about how borgbackup's currently bundled crc32 code performs on macOS 12 with M1 cpu (again on my local machine):
Name (time in us) Mean StdDev Median OPS
---------------------------------------------------------------------------------------------------------------
test_zlib_crc32 560.6912 (1.0) 14.5614 (1.0) 563.9375 (1.0) 1,783.5129 (1.0)
test_borg_crc32_slice8 7,326.7399 (13.07) 117.9650 (8.10) 7,324.9590 (12.99) 136.4864 (0.08)
have_clmul
is False, thus borg_crc32_clmul
is not available (only implemented on x64 within the code currently bundled into borg).
Benchmarks done on github CI - (linux, x64):
Name (time in us) Mean StdDev Median OPS
---------------------------------------------------------------------------------------------------------------
test_borg_crc32_clmul 515.9855 (1.0) 19.2178 (1.0) 520.4060 (1.0) 1,938.0391 (1.0)
test_borg_crc32_slice8 3,958.2522 (7.67) 84.5450 (4.40) 3,973.1480 (7.63) 252.6368 (0.13)
test_zlib_crc32 7,500.5678 (14.54) 116.3165 (6.05) 7,520.1550 (14.45) 133.3232 (0.07)
Benchmarks done on github CI - (macOS, x64):
Name (time in ms) Mean StdDev Median OPS
---------------------------------------------------------------------------------------------------
test_zlib_crc32 3.1880 (1.0) 0.39 (5.85) 3.0442 (1.0) 313.6777 (1.0)
test_borg_crc32_slice8 4.6606 (1.46) 0.0656 (1.0) 4.6442 (1.53) 214.5655 (0.68)
code: https://github.com/borgbackup/borg/pull/6387 - it would also benchmark deflate.crc32
as soon is that is in a pypi release.
It makes me wonder how libdeflate would fair against zlib-ng. That might explain why Python on MacOS is so different. Whichever version is in active use may be using zlib-ng instead of regular zlib. If so, should we just import the zlib-ng code since it may be doing better than libdeflate?
yeah, zlib-ng definitely also worth testing (but maybe a little bit off-topic here).
Updated performance results using libdeflate 1.12 on macOS M1:
(borg-env) tw@mba2020 borg % borg benchmark cpu
Non-cryptographic checksums / hashes ===========================
crc32 (zlib, used) 1GB 0.055s
crc32 (libdeflate) 1GB 0.027s
xxh64 1GB 0.122s
Great update, it used to be slower, but now libdeflate 1.12 is twice as fast as zlib crc32 on macOS M1!
guess this is solved by the new libdeflate.
linux seems good, macOS (x64, Intel) mediocre, macOS (M1, Apple Silicon) the worst.
See there: #21
TODO: move insights from there to issues (guess best place is not here, but in
libdeflate
's issue tracker.