Closed Alexhuszagh closed 3 years ago
Thanks, that's great! (and read/write functions are now quite neat and elegant)
(These were the only places left from the original codebase with endian-dependent code)
Technically, 1 more thing: in decimal.rs
, the following does not need to read to the native byteorder:
let v = s.read_u64();
if !is_8digits(v) {
break;
}
d.digits[d.num_digits..].write_u64(v - 0x3030_3030_3030_3030);
d.num_digits += 8;
s = s.advance(8);
This is because the data isn't parsed, only read, converted from characters to digits, and then written. If this is done on big-endian systems without a byte-swap, we would get the following for b"12345678"
:
let v = s.read_u64_ne(); // BE: 0x3132333435363738, LE: 0x3837363534333231
if !is_8digits(v) { // Always false, since we've just reversed the order of the characters in `v`.
break;
}
// Always writes [1, 2, 3, 4, 5, 6, 7, 8];
d.digits[d.num_digits..].write_u64_ne(v - 0x3030_3030_3030_3030);
d.num_digits += 8;
s = s.advance(8);
In short, a fairly minor optimization could be read_u64_ne
, read_u64_le
, and write_u64_ne
, where only read_u64_le
would have a byte-swap. However, in practice, this at most omits a single, very fast instruction, on a code-path that's very rare and slow.
So we're basically wasting two bswaps per 8 chars on BE on a slow path? Yea, can be probably ignored.
All in all, looks good, let's merge this then.
Fixes #26.
Rationale
This should produce the same byte-code, and remove all endian-dependent codepaths, given that the following are true:
u64::from_le
andu64::to_le
are no-ops on little-endian architectures.u64::from_le
andu64::to_le
are very cheap on big-endian architectures.ptr::read_unaligned
andptr::write_unaligned
are identical toptr::copy_nonoverlapping(src, dst, mem::size_of::<T>())
The first 2 are trivial to show that they're true:
For 3, we can see that read_unaligned is effectively identical to
ptr::copy_nonoverlapping(src, dst, mem::size_of::<T>())
, as long asMaybeUninit
compiles down to no instructions.Using the following source, we can see they're identical (on little-endian systems):
Compiled with
-C opt-level=3
, we can see the x86_64 assembly is identical.This also includes tests to ensure that both big-endian and little-endian systems read the bytes the same way.
Correctness Concerns
Should be non-existent, since as long as the value is read and written to the same native integer, then all the integer operations will produce the same result no matter the byte-order of the architecture. Tests using
b"01234567"
are included for bothread_u64
andwrite_u64
, which should confirm it produces the integer0x3736353433323130
. If we did not useto_le
andfrom_le
, we'd expect the opposite byte-order, or0x3031323334353637
(which would correspond to bytes ofb"76543210"
in little-endian). In short, we've confirmed we've gotten the proper result, and we've provided a significant optimization for big-endian architectures, and simplified a few functions.Alternatives
We could change all the masks and operations to check if the digits are correct to big-endian, however, this might require some additional effort to check correctness, and might require changes in many more locations. Since swapping the byte-order of an integer is effectively free in the grand scheme of things, this should be satisfactory.
Benchmarks
The benchmarks on big-endian are emulated via Qemu, and therefore should be taken with a grain of salt. However, the performance for little-endian systems is identical, and the (emulated) performance improves for big-endian systems.
Little-Endian (Native),
read_u64
Little-Endian (Native),
master
Big-Endian (powerpc-unknown-linux-gnu),
read_u64
Big-Endian (powerpc-unknown-linux-gnu),
master