Closed floswald closed 6 years ago
So apparently x64 CPU without cmpxchg16b exists....
Yeah, this apparently is the processor Julia won't support with the generic binaries.
Is this something we can address in how we build our generic binaries or will folks with this cpu have to build from source?
It's currently hard coded so not even build from unpatched source can fix this.
was about to ask that. I am having the exact same issue when I build this from source.
What's needed is to comment out https://github.com/JuliaLang/julia/blob/ec60445ef6f702f31a387bbca5370dee7b313816/src/codegen.cpp#L5925. Hopefully now we support libatomic the test can still pass (you'll need llvm 3.8/3.9 though). And glibc should be using ifunc to do the operation without a lock.
ok this may be a tall order for me. i could certainly comment out that line, but no idea what glibc and ifunc are/do. In the meantime, can you tell me what characteristic of a machine to avoid? what exactly is it that makes this fail here and how could I test for it? I'm happy to run tests if that's any use.
how old is this system?
I wouldn't think too old, but what do I know. here is the list of machines:
Machine Group Total/Online Cores Processor (GHz) Mem (GB) Disk (GB) Other
Arbuckle 1 32 E7-8837 (2.67) 1006.4 105
Ball 1 16 Opteron 6212 (1.40) 31.4 874
Cannon 1 32 E7-8837 (2.67) 252 874
Cheech*^ 20 12 E5-2620 (2.00) 62.9 197
Chong* 12 12 E5-2620 (2.00) 62.9 883
Cooper 1 32 Opteron 8356 (2.29) 63 104
Costello 30/28 4 Xeon 5160 (3.00) 7.8/15.7 110 15 nodes 7.8GB / 14 nodes 15.7 GB
Groucho 40/29 8 Xeon E5462 (2.80) 15.7 195
Gummo 5 8 Xeon L5520 (2.27) 11.7 417
Hale 20 12 Xeon X5650 (2.67) 23.5 432
Hardy (offline) 24/23 8 15.7
Larry^ 56 4 E3-1240 V2 (3.40) 14.9 89
Laurel 3 8 Xeon E5450 (3.00) 62.2/31.4 98 1 node 62.2GB / 2 nodes 31.4GB
Lemmon 20 8 Xeon E5520 (2.27) 23.5 113
Lum*^ 16 24 E5-2620 0 (2.00) 31.4/62.9/125.9 432 4 nodes 31.4GB / 4 nodes 62.9GB / 8 nodes 125.9GB
Matthau 10 8 Xeon X5570 (2.93) 47.1 113
Moe*^ 16 4 E3-1270 V2 (3.50) 30.7 89
Normand 1 24 Xeon E7540 (2.00) 62.9 105
Pace 12 12 Xeon X5660 (2.80) 47.2/62.9/94.4 244 4 nodes 47.2GB / 4 nodes 62.9GB / 4 nodes 94.9GB
Shemp (qrsh) 4 4 Phenom II X4 910e (0.80) 15.7 432
Spark 1 8 Core i7 CPU 920 (2.67) 11.7 650 2 x Tesla K20 GPUs 2496 cores 5GB RAM
Spark2 1 8 Core i7 CPU 950 (3.07) 11.7 45 2 x Tesla 2050 GPUs 448 cores 3GB RAM; 1 GTX 580 GPU 512 cores 1.5GB RAM
Spark3 1 8 Core i7 CPU 920 (2.67) 23.6 424 2 x Tesla K40 GPUs 2880 cores 12GB RAM
Stat 52/27 8 Opteron 2384 (2.70) 7.9/15.7 125 25 nodes 7.9GB / 2 nodes 15.7GB
Zeppo 20 12 Xeon X5650 (2.67) 23.5/94.4 421 18 nodes 23.5GB / 2 nodes 94.4GB
fry^ 240 4 E3-1240 V3 (3.40) 16 96 CentOS 6 (available shortly)
Abner*^ 8 16 E5-2650 V2 (2.60) 128 215 CentOS 6 (available shortly)
*10GB interface
^SSD local disk
Login Nodes
Bchuckle 6 Xeon X5650 (2.67) 4 2.4
Comic1 6 Xeon X5650 (2.67) 8 13
Corbert (offline) 6 4
Elwood 8 Opteron 875 (1.8) 19.4 100 Matlab GUI interactive submission node
Jake 8 Opteron 875 (1.8) 15.5 98 Matlab GUI interactive submission node
Jones 6 Xeon X5650 (2.67) 4 13
Morecambe1 6 Xeon X5650 (2.67) 4 13
Pchuckle 8 Xeon E5450 (3.00) 15.7 105
Smith 6 Xeon X5650 (2.67) 4 13
Splash 8 Xeon L5609 (1.87) 7.8 206
Wilder 8 Opteron 2384 (2.69) 7.9 117
Wise 6 Xeon E5620 (2.40) 4 2.4
Grand Total (Total/Online) Nodes: 628/589; Cores: 4370/4074
ok this may be a tall order for me. i could certainly comment out that line, but no idea what glibc and ifunc are/do.
You don't need to worry about ifunc (it's only for performance not for correctness and should be automatic anyway) but you do need to compile llvm 3.9 to get the test pass after commenting out that line
In the meantime, can you tell me what characteristic of a machine to avoid? what exactly is it that makes this fail here and how could I test for it?
Make sure cx16
is in the feature list. According to the wiki page Jameson linked, any intel CPU should do.
Julia isn't an Intel-processors-only piece of software, how can we work around this for 0.5.1 binaries - preferably without changing llvm version? We have a cpuid check somewhere, don't we? Can we fall back to something?
Doing it with CPUID check basically means that we abort early (the julia process won't even start). Working around it is possible but non-trival.
We have this problem also on power see #14818 and https://github.com/JuliaLang/julia/pull/16066#issuecomment-247848542 for further discussion.
Closing in favor of #18706, as 0.5.x is unmaintained. Pleas reopen if this can be reproduced on a more recent version though.
i found test errors on this machine with the pre-compiled generic 64-bit unix binary from the website. cpuinfo at the bottom.
here is the cpuinfo