Closed NathanDunfield closed 4 years ago
Is it possible to know which instruction or which .so
file this is?
Can you send the output of lscpu
?
Can you send the output of
lscpu
?
I have the same error on both of the following:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 44
Model name: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
Stepping: 2
CPU MHz: 1596.000
BogoMIPS: 6117.75
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23
and
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 26
Model name: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
Stepping: 5
CPU MHz: 1596.000
BogoMIPS: 4521.28
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0,2,4,6
NUMA node1 CPU(s): 1,3,5,7
Is this running on top of some virtualization software?
Is this running on top of some virtualization software?
I believe it is on bare metal, as this is 120-node cluster maintained by my college. On a different class of node on the same cluster, I do not see these errors. Below is the lscpu
for the one that works, which is a Sandy Bridge-based processor (Core gen 2) rather than a Nehalem-based one (Core gen 1).
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
Stepping: 7
CPU MHz: 1200.000
BogoMIPS: 4999.28
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0,2,4,6,8,10
NUMA node1 CPU(s): 1,3,5,7,9,11
Is it possible to know which instruction or which
.so
file this is?
Hmm, I don't know how to get cysignals
out of the way to get a crash with a full C stacktrace. Given that it works on Sandy Bridge but not Nehalem, I'm guessing that some AVX instructions are creeping in. I resorted to https://stackoverflow.com/a/51466229 and but found no AVX instructions in liblinbox*
or sage.matrix.matrix*.so
. However, liblinbox
depends on liblapack
, libgivaro
, and libgfortran
all of which have AVX instructions, though I don't know if any those instructions are actually executed.
It's givaro
.
$ elfx86exts libgivaro.so
MODE64 (call)
AVX (vzeroupper)
CMOV (cmovne)
NOVLX (vxorpd)
AVX2 (vpand)
I've missed --disable-avx2
at https://github.com/conda-forge/givaro-feedstock/blob/master/recipe/build.sh#L22
Can you send a PR to fix it?
PR has been merged and I have verified that, after updating this conda package, that I no longer have this issue. Many thanks!
Issue: On a machine with Westmere processors (specifically Xeon X5675), I get the following errors in what looks like Linbox:
Environment (
conda list
): clean install withmamba --name=sage_full sage python=3.7
Details about
conda
and system (conda info
):