Closed nkbhan closed 4 years ago
How did you compile and build pyccl?
My guess is you build on one set of CPUs but the others. You might try reversing the roles, so you always build on the older CPU.
Also, upgrade to v2! You can install it via conda and never have this issue.
I had a similar problem and hunted the cause down to angpow. Removing a -native
compiler flag there solved the issue. I never got around making that PR. That was a while ago though. What's the current status of angpow in CCL?
I used pip to install pyccl
CC=gcc pip install pyccl
which handled the building and compiling.
I did this on the login node of the cluster I am working on (an intel one). I'll try what you suggested later today and try installing it on an amd compute node instead and see if that fixes it.
@beckermr I did a fresh install of pyccl in a new conda environment on one of the amd compute nodes I mentioned and it looks like I can use this pyccl install on any node (amd or intel) without getting the 'illegal instruction' error. Thanks for the tip!
It might be good to keep this open since it's a common problem. For example, we have a heterogeneous cluster with over a dozen of different architectures here. The scheduler can put your job on any of those and it's non-trivial to find which architecture is the "oldest".
Use the conda package. This is built to handle this situation.
I tested conda package to see if it could handle this situation but doing the installation on an intel node on the cluster I was using still led to an illegal instruction error on amd nodes
EDIT: I was using version 2.0.1
Can you post more info? All conda packages are compiled with flags that enforce them to use very old instruction sets.
here is an example from the build logs:
$BUILD_PREFIX/bin/x86_64-conda_cos6-linux-gnu-cc -I$BUILD_PREFIX/include -I$SRC_DIR/include -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -I$PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/pyccl-2.0.1 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -O3 -fomit-frame-pointer -fno-common -fPIC -std=gnu99 -DHAVE_ANGPOW -fopenmp -o CMakeFiles/objlib.dir/src/ccl_background.c.o -c $SRC_DIR/src/ccl_background.c
notice the -mtune=nocona -mtune-haswell
As a general point, you should always post actual details of the errors when this happens.
Can you double check you are using the right version of CCL and not the old one you installed?
Does this also apply to the angpow build?
The CMake file still got the -march=native
there:
https://github.com/LSSTDESC/Angpow4CCL/blob/131b280ef7a551baa128f01e4257c83b1d775ae1/CMakeLists.txt#L19
We don't build angpow right now
Are these AMD CPUs exceptionally old?
Regarding the AMD CPUs, they are "AMD Opteron(tm) Processor 6136" as per /proc/cpuinfo
which were launched in 2010 if I am not mistaken.
Here are the steps I took earlier:
On the login node of my cluster, I made a fresh conda environment
$ grep -i "model name" /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
$ conda create --name ccl
$ conda activate ccl
Install ccl from conda-forge
$ conda install -c conda-forge pyccl
Check the version number:
$ conda list | grep -i pyccl
pyccl 2.0.1 py37h174e469_0 conda-forge
Test pyccl
$ python
Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyccl
>>> quit()
$
Switch to the compute node
$ salloc --ntasks=1 --time=00:30:00
salloc: Granted job allocation 321023
$ srun --jobid=321023 --pty /bin/bash
Check the cpu type:
$ grep -i "model name" /proc/cpuinfo
model name : AMD Opteron(tm) Processor 6136
test pyccl:
$ conda activate ccl
$ python
Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyccl
Illegal instruction
$
python crashes and I'm back to the prompt
I'm not sure where to find the build logs, but if there us anything else that might be heplful do let me know
Can you send me the output of lscpu
on the AMD nodes?
Sure
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 8
CPU socket(s): 2
NUMA node(s): 4
Vendor ID: AuthenticAMD
CPU family: 16
Model: 9
Stepping: 1
CPU MHz: 2400.052
BogoMIPS: 4800.46
Virtualization: AMD-V
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 5118K
NUMA node0 CPU(s): 0,2,4,6
NUMA node1 CPU(s): 8,10,12,14
NUMA node2 CPU(s): 9,11,13,15
NUMA node3 CPU(s): 1,3,5,7
More questions I am getting from conda-forge devs. Can you print out
gcc -march=native -Q --help=target
?
on the amd node or the node where I installed pyccl?
an AMD node
For the AMD node
$ gcc -march=native -Q --help=target
The following options are target specific:
-m128bit-long-double [disabled]
-m32 [disabled]
-m3dnow [disabled]
-m3dnowa [disabled]
-m64 [enabled]
-m80387 [enabled]
-m96bit-long-double [enabled]
-mabm [enabled]
-maccumulate-outgoing-args [disabled]
-maes [disabled]
-malign-double [disabled]
-malign-functions=
-malign-jumps=
-malign-loops=
-malign-stringops [enabled]
-march= amdfam10
-masm=
-mavx [disabled]
-mbmi [disabled]
-mbranch-cost=
-mcld [disabled]
-mcmodel=
-mcrc32 [disabled]
-mcx16 [enabled]
-mf16c [disabled]
-mfancy-math-387 [enabled]
-mfma [disabled]
-mfma4 [disabled]
-mforce-drap [disabled]
-mfp-ret-in-387 [enabled]
-mfpmath=
-mfsgsbase [disabled]
-mfused-madd [enabled]
-mglibc [enabled]
-mhard-float [enabled]
-mieee-fp [enabled]
-mincoming-stack-boundary=
-minline-all-stringops [disabled]
-minline-stringops-dynamically [disabled]
-mintel-syntax [disabled]
-mlarge-data-threshold=
-mlwp [disabled]
-mmmx [disabled]
-mmovbe [disabled]
-mms-bitfields [disabled]
-mno-align-stringops [disabled]
-mno-fancy-math-387 [disabled]
-mno-push-args [disabled]
-mno-red-zone [disabled]
-mno-sse4 [enabled]
-momit-leaf-frame-pointer [disabled]
-mpc
-mpclmul [disabled]
-mpopcnt [enabled]
-mpreferred-stack-boundary=
-mpush-args [enabled]
-mrdrnd [disabled]
-mrecip [disabled]
-mred-zone [enabled]
-mregparm=
-mrtd [disabled]
-msahf [enabled]
-msoft-float [disabled]
-msse [disabled]
-msse2 [disabled]
-msse2avx [disabled]
-msse3 [disabled]
-msse4 [disabled]
-msse4.1 [disabled]
-msse4.2 [disabled]
-msse4a [disabled]
-msseregparm [disabled]
-mssse3 [disabled]
-mstack-arg-probe [disabled]
-mstackrealign [enabled]
-mstringop-strategy=
-mtbm [disabled]
-mtls-dialect=
-mtls-direct-seg-refs [enabled]
-mtune= amdfam10
-muclibc [disabled]
-mveclibabi=
-mxop [disabled]
So I just learned a bunch of stuff from the conda-forge dev who was helping me. Here we go!
So if you run gcc -march=nocona -Q --help=target
, then you can see what instructions the code was compiled with. These include SSE instructions which are apparently disabled on your AMD CPUs. Thus code from conda-forge won't ever work on these CPUs.
This situation is rather rare and I have not seen it before. Also, googling your CPU model indicates it should have these instructions so I am confused by that. However I think this is what is going on. You might ask your local IT people what is going on there or for the actual docs on your CPUs. I might have found the wrong one.
This also explains in detail what happened before with versions compiled by hand. Compiling pyccl from source on the AMD CPUs worked because they don't put in SSE instructions and so OFC the intel ones can execute the code. However, going to other way won't work because the intel CPUs put in the SSE instructions and the AMD CPUs choke on them.
Interesting, thanks for letting me know! I'll reach to my local IT folks to ask about the SSE instructions on the AMD nodes. In the meantime, it seems like installing pyccl on the AMD nodes seems to be the way to go to ensure that I can run it on any compute node of the cluster I'm on.
I was trying to use the 1.0.0 version of ccl on a centOS cluster. I'm using an anaconda environment with python 3. I installed cmake and swig from using conda and then installed pyccl from pip - all on the login node of this cluster. I can import pyccl in python without a problem on this node:
However on certain compute nodes, namely the ones running AMD cpus, I get an illegal instruction error when trying to import pyccl in python, and python crashes
On the compute nodes with Intel cpus, I do not get this error: