The original implementation assumes the memory size >= max(src, dest) + ROUND_UP(len, 32), which is not always true (and result in unexpected revert).
The new implement will use reverse copy so that no out-of-copy memory will be touched (and also simplified the mask so that out-of-copy memory is unchanged).
The original implementation assumes the memory size >= max(src, dest) + ROUND_UP(len, 32), which is not always true (and result in unexpected revert).
The new implement will use reverse copy so that no out-of-copy memory will be touched (and also simplified the mask so that out-of-copy memory is unchanged).