BLAKE2 / libb2

C library providing BLAKE2b, BLAKE2s, BLAKE2bp, BLAKE2sp
Creative Commons Zero v1.0 Universal
132 stars 47 forks source link

Simplify loading and storing integer functions. #20

Closed benjaminp closed 6 years ago

benjaminp commented 6 years ago

For load32 and load64, we can simply delete the "optimized" case. For little-endian targets, both GCC and Clang recognize the load of a native 32-bit or 64-bit integer and turn it into the smallest possible move instruction.

Unfortunately, the same doesn't hold for stores, and we still need to detect the native endianess. There's no standard way to do this, but using a constant union seems to do the trick and is completely constant-folded away during compilation.

Remove the ALIGNED_ACCESS_REQUIRED configure check. This isn't a very good thing to check through configure because it requires running an executable, which doesn't work for cross compiles. Compilers generally know whether a target supports unaligned accesses and if so, whether it's profitable to do in a particular place. Even if the target doesn't support unaligned loads, the load & store functions may be inlined somehwere will where higher alignment is proven. My approach is to memcpy the integer through a union. Smart compilers can see through the memcpy and turn it into a single move instruction if possible.

sneves commented 6 years ago

This is a good idea, but I think there's some kinks to be ironed out here. Getting rid of the alignment hints is a good idea, but the direct loads are still necessary---not every compiler is a recent GCC or Clang. In particular Clang only optimizes the generic load into a single mov starting with 6.0.0, and GCC starting with 5. MSVC and other compilers do not do it at all.

That being said, the right way to do this is not with a pointer cast or unions, but with memcpy, as I had already done here, but unfortunately forgot to transplant to this repository.

I'm not sure how I feel about the host_little_endian function. I would prefer something that was genuinely a constant expression, instead of relying on the compiler to optimized things out.

benjaminp commented 6 years ago

Thank you for taking a look.

I'm not thrilled about host_little_endian either. What we really want is htole64 and letoh64 to be standard C functions. Checking __BYTE_ORDER__ would almost do it except that MSVC defines no such macro to my knowledge. Maybe that's okay, since the code will still be correct if the other path is taken.

I think what I'll do is close this and send another PR for the memcpy change.