Open laraPPr opened 1 year ago
As discussed with @laraPPr, next steps before looking into reporting this upstream are:
SciPy-bundle/2021.10-foss-2021b
that is now in place in EESSI pilot 2023.06 (see also https://numpy.org/doc/stable/reference/testing.html#running-tests-from-inside-python);SciPy-bundle
versions with more recent toolchains, and see if the failing tests persist;SciPy-bundle/2021.05-foss-2021a
;While trying to rerun the tests (numpy.fft.test(verbose=3) and numpy.polynomial.test(verbose=3)) on the neoverse_v1 aws clusters ( c7g.2xlarge and c7g.4xlarge). I found that EESSI was using the neoverse_n1 installs. So it will be difficult to rerun the failing tests of the numpy that is now in place for neoverse_v1 architectures.
@fair-mastodon-c7g-2xlarge-0002 ~]$ source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash
Found EESSI pilot repo @ /cvmfs/pilot.eessi-hpc.org/versions/2023.06!
archspec says aarch64/neoverse_n1
Using aarch64/neoverse_n1 as software subdirectory.
Using /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/modules/all as the directory to be added to MODULEPATH.
Found Lmod configuration file at /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/.lmod/lmodrc.lua
Initializing Lmod...
Prepending /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/modules/all to $MODULEPATH...
Environment set up to use EESSI pilot software stack, have fun!
Hmm, it's worth figuring out why neoverse_n1
is being selected on the Graviton 3 instances, that's sub-optimal, but it's orthogonal to this issue (so let's open a dedicated separate issue for that).
You can bypass what archspec thinks is the most suited software substack by setting $EESSI_SOFTWARE_SUBDIR_OVERRIDE
before running the EESSI init script:
export EESSI_SOFTWARE_SUBDIR_OVERRIDE="aarch64/neoverse_v1"
Where should I open that issue?
numpy.fft.test(verbose=3)
numpy.polynomial.test(verbose=3)
Where should I open that issue?
Issue on problems with CPU detection should go here in software-layer
repo
When extending the list of known issues with more info (see PR #340), I noticed that there are also 2 failing tests in the scipy
test suite on aarch64/neoverse_v1
for SciPy-bundle/2021.05-foss-2021a
optimize/tests/test_linprog.py::TestLinprogIPSparse::test_bug_6139 FAILED [ 44%]
optimize/tests/test_linprog.py::TestLinprogIPSparsePresolve::test_bug_6139 FAILED [ 45%]
and 55 failing tests in SciPy-bundle/2021.10-foss-2021b
:
With SciPy-bundle/2022.05-foss-2022a
(scipy 1.8.1), we're seeing 18 failing tests on aarch64/neoverse_v1
(via PR #346), but for some reason we still get a zero exit code...
There are also failing tests for scipy
1.8.1, but the test command is still existing with zero exit code...
edit: ah, that's because the scipy
easyblock auto-enables ignore_test_result
for scipy < 1.9, because only from 1.9 onwards did we start being a bit more strict on the scipy test suite...
With SciPy-bundle/2023.02-gfbf-2022b
(scipy 1.10.1), there's suddenly a lot more failing tests on aarch64/neoverse_v1
(via PR #3477): 928 (out of 49043)...
edit: this should be put in context though, since total tests also went up from 35441 (scipy 1.8.1) to 49043 (scipy 1.10.1)
Some more updates here:
For SciPy-bundle/2023.07-gfbf-2023a
+ SciPy-bundle/2023.11-gfbf-2023b
we see 2 failing tests on aarch64/neoverse_v1
in software.eessi.io/versions/2023.06
:
FAILED scipy/spatial/tests/test_distance.py::TestPdist::test_pdist_correlation_iris
FAILED scipy/spatial/tests/test_distance.py::TestPdist::test_pdist_correlation_iris_float32
= 2 failed, 54876 passed, 3021 skipped, 223 xfailed, 13 xpassed in 878.32s (0:14:38) =
For SciPy-bundle/2023.02-gfbf-2022b
in software.eessi.io/versions/2023.06
, we built numpy
with -march=armv8.4-a
instead of -mcpu=native
, to avoid a significant increase in failing tests , see https://github.com/EESSI/software-layer/pull/448 + https://github.com/EESSI/software-layer/pull/419#issuecomment-1878857561 .
With that change, we see the same 2 failing tests as we do for SciPy-bundle/2023.07-gfbf-2023a
+ SciPy-bundle/2023.11-gfbf-2023b
.
For aarch64/a64fx
, we're seeing:
SciPy-bundle-2023.11-gfbf-2023b.eb
:
FAILED scipy/optimize/tests/test_minimize_constrained.py::TestTrustRegionConstr::test_list_of_problems
FAILED scipy/spatial/tests/test_distance.py::TestPdist::test_pdist_correlation_iris
FAILED scipy/spatial/tests/test_distance.py::TestPdist::test_pdist_correlation_iris_float32
= 3 failed, 54875 passed, 3021 skipped, 223 xfailed, 13 xpassed in 5753.99s (1:35:53) =
SciPy-bundle-2023.07-gfbf-2023a.eb
:
FAILED scipy/optimize/tests/test_linprog.py::TestLinprogIPSparse::test_bug_6139
FAILED scipy/optimize/tests/test_linprog.py::TestLinprogIPSparsePresolve::test_bug_6139
FAILED scipy/spatial/tests/test_distance.py::TestPdist::test_pdist_correlation_iris
FAILED scipy/spatial/tests/test_distance.py::TestPdist::test_pdist_correlation_iris_float32
= 4 failed, 54407 passed, 3016 skipped, 223 xfailed, 13 xpassed, 10917 warnings in 6068.43s (1:41:08) =
That's in line with what we've seen for neoverse_v1
.
We're seeing a number of failing tests in the numpy test suite on ARM neoverse_v1 in:
For now, we've ignored the failure of the following tests, but we'll need to investigate this issue in more detail. Will the test failures of SciPy-bundle pop up in more installs on ARM neoverse_v1 where we also increased the number of accepted test failures for OpenBlas? Or will this result in more problems when installing other software dependant on numpy?
See also:
It did not seem to be a problem however in: