The transmute from [u8; 128] in MeowHash::from_bytes is undefined behavior since Simd128 requires 16-byte alignment and a u8 array only has guaranteed 1-byte alignment. Aside from what Rust considers undefined behavior, the backend would also be at liberty to emit movaps here for the copy, which will crash if the array is misaligned. It should be possible to use ptr::copy_nonoverlapping in the same way you'd use memcpy in C, which can be emitted as movups (rather than movaps) at the machine code level; the runtime performance is the same as movaps if the source array happens to be 16-byte aligned.
The transmute from [u8; 128] in MeowHash::from_bytes is undefined behavior since Simd128 requires 16-byte alignment and a u8 array only has guaranteed 1-byte alignment. Aside from what Rust considers undefined behavior, the backend would also be at liberty to emit movaps here for the copy, which will crash if the array is misaligned. It should be possible to use ptr::copy_nonoverlapping in the same way you'd use memcpy in C, which can be emitted as movups (rather than movaps) at the machine code level; the runtime performance is the same as movaps if the source array happens to be 16-byte aligned.