Open jamesjer opened 7 years ago
The new code is little-endian specific, whereas the old code was not.
The old code copies 64 bits via memcpy from a pointer to 32-bit values. This means that it copies 2 of the 32-bit values into a composite 64-bit value. The first of the two 32-bit values is copied to the first 32-bits of the 64-bit value; i.e., the lower-order 32 bits on a little endian machine and the higher-order 32 bits on a big endian machine. The second 32-bit value is copied to the higher-order 32 bits of the 64-bit value on a little endian machine, and the lower-order 32 bits on a big endian machine. This leads directly to an infinite loop in appendRow(); see https://github.com/Macaulay2/mathicgb/issues/3. (At least, the loop would be infinite if it didn't contain a memory leak which eventually exhausts all memory and makes the test die.)
The new code produces the same 64-bit value from the two 32-bit values, regardless of endianness. I have done test builds on both a PPC64 machine (big endian) and an x86_64 machine (little endian). With the new code, all tests pass on both machines.
Oh, that's your point. But then your change is hiding the real bug. Your change does make the code behave the same way on both architectures, but probably what we really want is for the code to behave the same whether you use
A = *ptr(a, i*2) | (((uint64)*ptr(a, i*2 + 1)) << 32);
or
A = *ptr(a, i*2+1) | (((uint64)*ptr(a, i*2)) << 32);
After all, the data consists of packed exponent vectors. Probably an overflow during the addition from one bit field into the next failed to be detected.
@jamesjer, @DanGrayson What is the status of this pull request?
These routines are only checking whether one vector of degrees is the sum of two other vectors of degrees, so endianness isn't relevant, even though you are packing vectors into int64's with memcpy, under the assumption that there is no overflow when you compute A+B
from one subfield into the next.
It's hard for me to read the code, so let's consider a baby case to see whether the code is correct:
int8 a[2], b[2], ab[2];
int16 A, B, AB;
Using memcpy to move a to A results either in a[0] | (a[1]<<8)
or (a[0]<<8) | a[1]
, depending on endianness. Let's assume it's little endian, so it's the former. Move a to A, b to B, ab to AB, and check whether AB==A+B. We want to prove that's the case if and only if ab[i]=a[i]+b[i] for all i.
If there is a proof, it's valid for either endianness, since the order of entries in a, b, and ab doesn't affect the truth of the right hand statement.
So, if a bug was encountered, the problem is that we can't prove the statement above, and this pull request is not the way to fix it.
Giving --enable-debug to the top-level M2 configure scripts removes -DNDEBUG instead of adding -DDEBUG, so the MATHICGB_ASSERT statements are never activated and the bug is never detected.
I think that's what I was getting at before.
Anyway, here is a counterexample to the statement:
a[0] = 128
a[1] = 0
A = 128
b[0] = 128
b[1] = 0
B = 128
ab[0] = 0
ab[1] = 1
AB = 256
@d-torrance @jamesjer Is this still relevant?
@d-torrance @jamesjer Is this still relevant?
Yes. I'm using the original version of this PR (https://github.com/Macaulay2/mathicgb/commit/d8ae074c7f7655c3b85c2089cd7a05a98a70a46a) in the Debian mathicgb package. If I remember, the current version with le64toh
wasn't working for me when I tried it out on an s390x porterbox.
Another possible fix is to just skip over this particular optimization on big-endian architectures, which is what I proposed https://github.com/Macaulay2/M2/pull/2172, back when mathicgb was in the e
directory.
Copying 2 32-bit quantities into a 64-bit quantity with memcpy() has endian-specific results. This pull changes such code into code that produces the same result regardless of endianness.