improve duplicate_overlapping unsafe version. The compiler generates unfavourable assembly for the simple version.
Now we copy 4 bytes, instead of one in every iteration.
Without that the compiler will unroll/auto-vectorize the copy with a lot of branches.
This is not what we want, as large overlapping copies are not that common.
improve duplicate_overlapping unsafe version. The compiler generates unfavourable assembly for the simple version. Now we copy 4 bytes, instead of one in every iteration. Without that the compiler will unroll/auto-vectorize the copy with a lot of branches. This is not what we want, as large overlapping copies are not that common.