seems clang on Windows optimizes away the fio_memcpy cache in the reversed fio_memcpy logic. This in turn allows the loop to be vectorized in a way that breaks the reversed fio_memcpy logic... I'll try to fix it, not that the fio_memcpy fallback is often in use (for some reason it's faster than the alpine's memcpy in docker, but other than that the system's memcpy is usually best).
seems
clang
on Windows optimizes away thefio_memcpy
cache in the reversedfio_memcpy
logic. This in turn allows the loop to be vectorized in a way that breaks the reversedfio_memcpy
logic... I'll try to fix it, not that thefio_memcpy
fallback is often in use (for some reason it's faster than the alpine's memcpy in docker, but other than that the system's memcpy is usually best).