cheatfate / nimcrypto

Nim cryptographic library
MIT License
190 stars 23 forks source link

align MDigest #68

Closed arnetheduck closed 1 year ago

arnetheduck commented 1 year ago

The introduction of alignment in MDigest allows the compiler to choose aligned instructions for copying, zeroing and processing digests resulting in better codegen for platforms with such instructions and performance increases on platforms where unaligned access is heavily penalised.

Here's an MDigest[256] copy without alignment:

 movdqu xmm0,XMMWORD PTR [rdi]
 movups XMMWORD PTR [rsi],xmm0
 movdqu xmm1,XMMWORD PTR [rdi+0x10]
 movups XMMWORD PTR [rsi+0x10],xmm1

Same, but with alignment:

 movdqa xmm0,XMMWORD PTR [rdi]
 movaps XMMWORD PTR [rsi],xmm0
 movdqa xmm1,XMMWORD PTR [rdi+0x10]
 movaps XMMWORD PTR [rsi+0x10],xmm1

We can see aligned loads/stores used for both (using gcc / generic x86_64 CPU) - of course, ideal alignment would be done up to 64 bytes but this breaks dynamic allocation which will not let itself be aligned further than 16 (typically).

mratsim commented 1 year ago

of course, ideal alignment would be done up to 64 bytes but this breaks dynamic allocation which will not let itself be aligned further than 16 (typically).

An aligned allocator (posix8memalign) can be exposed for ptr UncheckedArray. And within Nim objects, the compiler will do the right thing. For seq objects, we can log a feature request to Nim upstream. It would be very helpful in other domains like scientific computing/machine learning/image processing as well.

arnetheduck commented 1 year ago

An aligned allocator (posix8memalign) can be exposed for ptr UncheckedArray.

MDigest is too much of a general-purpose type for this to make sense - ie for such advanced use cases, the instance itself can be made "more" aligned (rather than the data field inside)