Closed benjaminp closed 6 years ago
This is a good idea, but I think there's some kinks to be ironed out here. Getting rid of the alignment hints is a good idea, but the direct loads are still necessary---not every compiler is a recent GCC or Clang. In particular Clang only optimizes the generic load into a single mov
starting with 6.0.0, and GCC starting with 5. MSVC and other compilers do not do it at all.
That being said, the right way to do this is not with a pointer cast or unions, but with memcpy
, as I had already done here, but unfortunately forgot to transplant to this repository.
I'm not sure how I feel about the host_little_endian
function. I would prefer something that was genuinely a constant expression, instead of relying on the compiler to optimized things out.
Thank you for taking a look.
I'm not thrilled about host_little_endian
either. What we really want is htole64
and letoh64
to be standard C functions. Checking __BYTE_ORDER__
would almost do it except that MSVC defines no such macro to my knowledge. Maybe that's okay, since the code will still be correct if the other path is taken.
I think what I'll do is close this and send another PR for the memcpy
change.
For load32 and load64, we can simply delete the "optimized" case. For little-endian targets, both GCC and Clang recognize the load of a native 32-bit or 64-bit integer and turn it into the smallest possible move instruction.
Unfortunately, the same doesn't hold for stores, and we still need to detect the native endianess. There's no standard way to do this, but using a constant union seems to do the trick and is completely constant-folded away during compilation.
Remove the ALIGNED_ACCESS_REQUIRED configure check. This isn't a very good thing to check through configure because it requires running an executable, which doesn't work for cross compiles. Compilers generally know whether a target supports unaligned accesses and if so, whether it's profitable to do in a particular place. Even if the target doesn't support unaligned loads, the load & store functions may be inlined somehwere will where higher alignment is proven. My approach is to memcpy the integer through a union. Smart compilers can see through the memcpy and turn it into a single move instruction if possible.