PDLPorters / pdl-linearalgebra

https://p3rl.org/PDL::LinearAlgebra
1 stars 7 forks source link

0.27 fails to build on i386 due to test failures #12

Closed sebastic closed 2 years ago

sebastic commented 2 years ago

The Debian package build for 0.27 failed on i386 due to test failures:

https://buildd.debian.org/status/package.php?p=libpdl-linearalgebra-perl

From the buildlog

PERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(1, 'blib/lib', 'blib/arch')" t/*.t

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 10.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 11.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 12.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 13.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 14.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 15.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 16.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 17.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]

#   Failed test 'native complex mschur'
#   at ./t/common.pl line 18.
# got(PDL): 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# Looks like you failed 9 tests of 237.
t/1.t ....... 
[...]
ok 6 - mschur
not ok 7 - native complex mschur
ok 8 - mschur
not ok 9 - native complex mschur
ok 10 - mschur
not ok 11 - native complex mschur
ok 12 - mschur
not ok 13 - native complex mschur
ok 14 - mschur
not ok 15 - native complex mschur
ok 16 - mschur
not ok 17 - native complex mschur
ok 18 - mschur
not ok 19 - native complex mschur
ok 20 - mschur
not ok 21 - native complex mschur
ok 22 - mschur
not ok 23 - native complex mschur
[...]
Dubious, test returned 9 (wstat 2304, 0x900)
Failed 9/237 subtests 
[...]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 10.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 11.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 12.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 13.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 14.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 15.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 16.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 17.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 18.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]
# Looks like you failed 9 tests of 117.
t/legacy.t .. 
[...]
not ok 7 - PDL::Complex mschur
not ok 8 - PDL::Complex mschur
not ok 9 - PDL::Complex mschur
not ok 10 - PDL::Complex mschur
not ok 11 - PDL::Complex mschur
not ok 12 - PDL::Complex mschur
not ok 13 - PDL::Complex mschur
not ok 14 - PDL::Complex mschur
not ok 15 - PDL::Complex mschur
[...]
Dubious, test returned 9 (wstat 2304, 0x900)
Failed 9/117 subtests 

Test Summary Report
-------------------
t/1.t     (Wstat: 2304 Tests: 237 Failed: 9)
  Failed tests:  7, 9, 11, 13, 15, 17, 19, 21, 23
  Non-zero exit status: 9
t/legacy.t (Wstat: 2304 Tests: 117 Failed: 9)
  Failed tests:  7-15
  Non-zero exit status: 9
Files=4, Tests=404,  3 wallclock secs ( 0.06 usr  0.02 sys +  2.60 cusr  0.14 csys =  2.82 CPU)
Result: FAIL
Failed 2/4 test programs. 18/404 subtests failed.

Full buildlog

mohawk2 commented 2 years ago

Thanks for the report! This captures the heart of it:

#   Failed test 'PDL::Complex mschur'
#   at ./t/common.pl line 11.
# got: 
# [
#  [0.366373539549749              0.72]
#  [                0 0.783626460450251]
# ]
# 
# expected: 
# [
#  [0.36637354      -0.72]
#  [         0 0.78362646]
# ]

The Schur decomposition of matrix A (I've just had to finally actually learn what that is) finds Q and U so that:

A = Q x U x Q^-1

mschur uses (for its complex form) LAPACK's zgees. As we can see above, U (an upper-triangular matrix) has similar values on i386 to what it has on x64, except the sign is switched. The test suite currently doesn't capture e.g. both Q and U to actually try reconstituting A to see if it's correct, though possibly it should.

A quick check shows Debian uses ATLAS as the LAPACK it links this library with (for my ref: https://salsa.debian.org/perl-team/modules/packages/libpdl-linearalgebra-perl/-/blob/master/debian/control). Would it be easy for you to, on i386, run this and tell the results here? (results from my x64 box with reference LAPACK shown)

$ perldl
pdl> use PDL::LinearAlgebra
pdl> p $A = pdl([0.43,0.03],[0.75,0.72])->r2C

[
 [0.43 0.03]
 [0.75 0.72]
]

pdl> p +($U, undef, $Z) = $A->mschur(1)

[
 [0.366373539549749             -0.72]
 [                0 0.783626460450251]
]
 [0.366373539549749 0.783626460450251] # just the eigenvalues, not important as same as diagonal of $U
[
 [ 0.426473531380713  0.904500042582455]
 [-0.904500042582455  0.426473531380713]
]

With that I can see if the $Z is genuinely different but still mathematically valid, or if something truly weird is going on (quite unlikely but best to be sure).

sebastic commented 2 years ago

In an i386 chroot with pdl (1:2.077-1+b1) & libpdl-linearalgebra-perl (0.26-2+b1):

pdl> use PDL::LinearAlgebra

pdl> p $A = pdl([0.43,0.03],[0.75,0.72])->r2C

[
 [0.43 0.03]
 [0.75 0.72]
]

pdl> p +($U, undef, $Z) = $A->mschur(1)
Can't locate object method "initialize" via package "PDL::Complex" at /usr/lib/i386-linux-gnu/perl5/5.34/PDL/Core.pm line 2288, <STDIN> line 3.

And with pdl (1:2.077-1+b1) & libpdl-linearalgebra-perl (0.27-1) built without running tests:

pdl> use PDL::LinearAlgebra

pdl> p $A = pdl([0.43,0.03],[0.75,0.72])->r2C

[
 [0.43 0.03]
 [0.75 0.72]
]

pdl> p +($U, undef, $Z) = $A->mschur(1)

[
 [0.366373539549749              0.72]
 [                0 0.783626460450251]
]
 [0.366373539549749 0.783626460450251] 
[
 [-0.426473531380713  0.904500042582456]
 [ 0.904500042582456  0.426473531380713]
]
mohawk2 commented 2 years ago

Thank you! So the Z I was seeing on x64 is:

[
 [ 0.426473531380713  0.904500042582455]
 [-0.904500042582455  0.426473531380713]
]

while on i386:

[
 [-0.426473531380713  0.904500042582456]
 [ 0.904500042582456  0.426473531380713]
]

Not very surprisingly, Z x U x Z' correctly multiplies out to the original A. Evidently, negating the top-right value of U means one has to negate the left-hand column of Z. That's probably very obvious to someone with basic matrix-maths competence, unlike me. I'll make sure the test can accommodate the alternative answer.

mohawk2 commented 2 years ago

For my own education (this is simplified by z2 and z3 in m3 being identical rather than conjugated, since the imaginary values in this scenario are 0):

A = Z U Z'

            "m1"    "m2"   "m3"
[a1 a2    [z1 z2  [u1 u2  [z1 z3
 a3 a4] =  z3 z4]   0 u4]  z2 z4]

m1 m2 = [z1u1 z1u2+z2u4
         z3u1 z3u2+z4u4]

m1m2 m3 = [(z1u1)z1+(z1u2+z2u4)z2 (z1u1)z3+(z1u2+z2u4)z4
           (z3u1)z1+(z3u2+z4u4)z2 (z3u1)z3+(z3u2+z4u4)z4]

The 0 in U is because it's upper-triangular, and it really clarifies the last bit: when you negate u2, that self-evidently means you need to also negate both z1 and z3 in order to retain the same result, as z1, z3 and u2 only appear as multiplicative pairs.