Closed rakshithprakash closed 6 years ago
Most of the POWER8-specific changes since 0.2.18 appear to have happened in a narrow timespan between April 19 and May 22 - while I cannot possibly comment on the power8 assembly, I think I understand that wernsaar also adjusted some thresholds for thread creation in the GEMM functions that may simply have made thread contention more likely, and 8310d4d3 also dropped the ALLOC_SHM define from Makefile.power without explanation (though it may have been spurious all along).
Unless somebody else comes up with a better idea, I wonder if you could try a snapshot from somewhere in the middle of what appears to have been the crucial period ( say 0551e57 from April 26 ), and/or try and see if limiting the number of threads created on each node via OMP_NUM_THREADS has any influence ?
Also do the Ubuntu and RHEL machines you mention use the exact same binary, or was OpenBLAS built separately on each (implying different compiler versions and/or options in use) ?
Did you use any flags when compiling OpenBLAS? Also full gcc and gfortran versions are more essential than kernel version (normally one assumes kernel shipped with distribution, or patched fully or anything in between)
Attach to frozen process with debugger and dump all backtraces: "$ script "$ gdb "> atta pid "> thread apply all backtrace "..... here is the interesting output ">deta ">quit "$ quit
And attach typescript file here.
I have compiled openblas using just the make command to use default options on both Ubuntu and RHEL and used export LD_LIBRARY_PATH=
I have used GCC and Gfortran version 5.3.1 on both Ubuntu and RHEL.
Also tried to collect the traces but I see the following error - [ No Source Available ]
Am I missing out something? Does it require to connect any debuggers? since it is a guest machine on KVM. Any other ways of collecting the traces?
You may need to build a debuggable version of OpenBLAS and HPL first to get any meaningful backtraces with gdb.
Just a quick peek into the problem - can you compare CPUID on compile machine and run machine?
Does it work out single threaded and/or without MPI binding options? OpenBLAS by default spins up pthreads for all available CPUs, or compile-time detected CPUs, whichever smallest (set OPENBLAS_NUM_THREADS to less if needed) , maybe MPI binding/affinity somehow hurts default build that does not try to bind processors. You can start gdb in build root directory where all source files are in place (sort of)
@rakshithprakash Can you provide me your make.inc from the HPL benchmark such that I can recompile it easily ?
@brada4 Attaching the traces :
@brada4 Both the compiler and the run machine are the same in my case. And I removed the binding in the mpi command and gave a run but I'm seeing the same issue again. Please find the command that I used : mpirun --allow-run-as-root --mca btl sm,self,tcp xhpl
@grisuthedragon Please find the attached Makefile
I tested it with the current development version of OpenBLAS on an IBM Power8+ with CentOS 7.3 running on and everything works fine with the HPL benchmark.
I compiled OpenBLAS using
make NUM_THREADS=1 USE_OPENMP=0
because for the HPL benchmark it is quite common to use only the parallelization coming from the MPI processes.
Probably related to #660, running with only one thread is bound to avoid any deadlocks from multithreading.
Even with enabled multithreading in OpenBLAS the hpl code works fine without running into a deadlock on my machine.
I was looking for t a a bt (i.e backtrace from all threads)
Current backtrace looks like OpenMP-enabled system OpenBLAS (symlinked to /usr/lib/libblas.so.3 via update-alternatives)- are you sure HPL is linked against freshly built OpenBLAS? (check with ldd)
@brada4 Please find the attached backtrace for all threads :
I also looked at the ldd and could see that libopenblas is linked to 2.19 version :
libopenblas.so.0 => /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0 (0x00003fffb4790000)
I have doubts about linked imports of xhpl (program that backtrace comes from) You can try LD_PRELOAD=...../openblas.so.0 xhpl (with full path in hope to override system library) Cleaner way would be to plant alternative for libblas.so.3 to work around HPL build system mistakes.
#2 exec_blas._omp_fn.0 () at blas_server_omp.c:312
#3 0x00003fff867be8a4 in GOMP_parallel () from /usr/lib/powerpc64le-linux-gnu/libgomp.so.1
#4 0x00003fff86ecc7e4 in exec_blas (num=<optimized out>, queue=<optimized out>) at blas_server_omp.c:305
---Type <return> to continue, or q <return> to quit---
#5 0x00003fff86dfbee0 in gemm_driver (args=<optimized out>, range_m=<optimized out>, range_n=<optimized out>, sa=<optimized out>, sb=<optimized out>, mypos=0) at level3_thread.c:672
#6 0x00003fff86dfc1f4 in dgemm_thread_nt (args=<optimized out>, range_m=<optimized out>, range_n=<optimized out>, sa=<optimized out>, sb=<optimized out>, mypos=<optimized out>)
at level3_thread.c:733
#7 0x00003fff87ba7cd0 in dgemm_ () from /usr/lib/libblas.so.3
#8 0x00000000100121f0 in HPL_dgemm ()
I did a export LD_PRELOAD=/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19 and i'm seeing the following error :
ERROR: ld.so: object '/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19' from LD_PRELOAD cannot be preloaded (cannot read file data): ignored.
And this is the ldd from export LD_LIBRARY_PATH=/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19 :
ldd ./xhpl linux-vdso64.so.1 => (0x00003fffa0ef0000) libblas.so.3 => /usr/lib/libblas.so.3 (0x00003fffa0e50000) libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00003fffa0d20000) libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00003fffa0b40000) libopenblas.so.0 => /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0 (0x00003fff9ff00000) libm.so.6 => /lib/powerpc64le-linux-gnu/libm.so.6 (0x00003fff9fe10000) libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00003fff9fde0000) libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00003fff9fd30000) libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00003fff9fc60000) libpthread.so.0 => /lib/powerpc64le-linux-gnu/libpthread.so.0 (0x00003fff9fc20000) /lib64/ld64.so.2 (0x0000000058cf0000) libgfortran.so.3 => /usr/lib/powerpc64le-linux-gnu/libgfortran.so.3 (0x00003fff9fae0000) libgomp.so.1 => /usr/lib/powerpc64le-linux-gnu/libgomp.so.1 (0x00003fff9fa90000) libdl.so.2 => /lib/powerpc64le-linux-gnu/libdl.so.2 (0x00003fff9fa60000) libhwloc.so.5 => /usr/lib/powerpc64le-linux-gnu/libhwloc.so.5 (0x00003fff9f9f0000) libutil.so.1 => /lib/powerpc64le-linux-gnu/libutil.so.1 (0x00003fff9f9c0000) libgcc_s.so.1 => /lib/powerpc64le-linux-gnu/libgcc_s.so.1 (0x00003fff9f990000) libnuma.so.1 => /usr/lib/powerpc64le-linux-gnu/libnuma.so.1 (0x00003fff9f960000) libltdl.so.7 => /usr/lib/powerpc64le-linux-gnu/libltdl.so.7 (0x00003fff9f930000)
Unlike LD_LIBRARY_PATH you need to include the name of the library in the LD_PRELOAD, so: export LD_PRELOAD=/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0 (and it does look a bit strange that ldd shows separate entries for the libopenblas.so.0 and a libblas.so.3 although you mentioned that both link to the same file)
If I look at Makefile_ppc.txt attached earlier, it uses both -lblas and -lopenblas Taking out -lblas will fix the issue of undefined result.
Users of Ubuntu 16.04 on POWER8 may also want to take note of Ubuntu Bug #1641241 here: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1641241 describing a misbehaviour of the hardware lock elision code included in recent versions of glibc that is apparently specific to the POWER platform (and worked around by the update linked at the end of the page) (Found via a bug report by bhart in https://github.com/tensorflow/tensorflow/issues/5482)
I did an export LD_PRELOAD for both versions of openblas - 2.18 and 2.19. Below is the ldd for 2.19 :
ldd ./xhpl linux-vdso64.so.1 => (0x00003fff87de0000) /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0 (0x00003fff871a0000) libblas.so.3 => /usr/lib/libblas.so.3 (0x00003fff87100000) libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00003fff86fd0000) libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00003fff86df0000) libm.so.6 => /lib/powerpc64le-linux-gnu/libm.so.6 (0x00003fff86d00000) libpthread.so.0 => /lib/powerpc64le-linux-gnu/libpthread.so.0 (0x00003fff86cc0000) libgfortran.so.3 => /usr/lib/powerpc64le-linux-gnu/libgfortran.so.3 (0x00003fff86b80000) libgomp.so.1 => /usr/lib/powerpc64le-linux-gnu/libgomp.so.1 (0x00003fff86b30000) libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00003fff86b00000) libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00003fff86a50000) libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00003fff86980000) /lib64/ld64.so.2 (0x0000000044b20000) libgcc_s.so.1 => /lib/powerpc64le-linux-gnu/libgcc_s.so.1 (0x00003fff86950000) libdl.so.2 => /lib/powerpc64le-linux-gnu/libdl.so.2 (0x00003fff86920000) libhwloc.so.5 => /usr/lib/powerpc64le-linux-gnu/libhwloc.so.5 (0x00003fff868b0000) libutil.so.1 => /lib/powerpc64le-linux-gnu/libutil.so.1 (0x00003fff86880000) libnuma.so.1 => /usr/lib/powerpc64le-linux-gnu/libnuma.so.1 (0x00003fff86850000) libltdl.so.7 => /usr/lib/powerpc64le-linux-gnu/libltdl.so.7 (0x00003fff86820000)
Both the versions 2.18 & 2.19 seem to work now after using LD_PRELOAD but I see that 2.19 is taking approximately 2.4x the time to complete when compared to 2.18. Please find the results below :
2.18 :
WR11R2C4 2000 140 1 2 2.44 2.190e+00
2.19 :
WR11R2C4 2000 140 1 2 5.98 8.922e-01
I collected the perf data and annotations for them, please find them below :
Samples: 293K of event 'cycles:ppp', Event count (approx.): 293830000000 Overhead Command Shared Object Symbol
1.01% xhpl mca_pml_ob1.so [.] mca_pml_ob1_iprobe
│ START_RPCC(); │ │ / thread has to wait / │ while(job[current].working[mypos][CACHE_LINE_SIZE * bufferside] == 0) {YIELDING;}; │1dad54: add r10,r26,r20 │1dad58: rldicr r10,r10,3,60 │1dad5c: ldx r9,r30,r10 0.00 │1dad60: cmpdi cr7,r9,0 │1dad64: bne cr7,1dad9c <inner_thread+0x65c> │1dad68: nop │1dad6c: ori r2,r2,0 7.92 │1dad70: nop │1dad74: nop │1dad78: nop 10.82 │1dad7c: nop 6.36 │1dad80: nop │1dad84: nop 17.26 │1dad88: nop 23.68 │1dad8c: nop │1dad90: ldx r9,r30,r10 3.85 │1dad94: cmpdi cr7,r9,0 │1dad98: beq cr7,1dad70 <inner_thread+0x630> │ │ STOP_RPCC(waiting2);
Not sure how to read the perf data (do you have the 2.18 values for comparison ?), are these results reproducible (and same workload etc on the machine during both runs) ? If the values are stable it could be that the changed thresholds mentioned above are not favorable for the matrix sizes in this particular benchmark. Perhaps @grisuthedragon has benchmark results from his machine easily available ?
@martin-frbg Here are the perf data for 2.18 for comparison :
Samples: 301K of event 'cycles:ppp', Event count (approx.): 301774000000 Overhead Command Shared Object Symbol
Yes it's reproducible.
can you fix conflicting libblas and libopenblas dependencies? So far what I see is mistake building HPL, nothing more.
Is it conceivable that you built 0.2.18 with different options, something more like the NUM_THREADS=1 USE_OPENMP=0 that grisuthedragon recommended above for hpl ? (No libgomp and no reference to threading in its perf results would explain less overhead...)
Most likely /usr/lib/libblas.so.3 is ubuntu-supplied openblas 0.2.18 built with OPENMP, and without CBLAS or LAPACK...
# update-alternatives --list
Should confirm it
@martin-frbg Here is the result of my benchmark( gcc 4.8.5, glibc 2,17, CentOS 7.3, Kernel 4.8[from Fedora 25], current OpenBLAS 0.2.20dev , OpenMPI 1.8.)
I optimized the HPL.dat for my machine now having the following:
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
4 # of problems sizes (N)
10240 20480 30720 40960 30 34 35 Ns
1 # of NBs
96 32 64 96 128 160 192 224 256 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
4 Ps
5 Qs
16.0 threshold
1 # of panel fact
2 1 2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
2 4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
2 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
0 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
and for the largest experiment (N = 40960) I get:
OMP_NUM_THREADS=1 mpirun --allow-run-as-root -np 20 ./xhpl
...
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00R2R2 40960 96 4 5 107.79 4.251e+02
HPL_pdgesv() start time Wed Jan 4 21:05:28 2017
HPL_pdgesv() end time Wed Jan 4 21:07:16 2017
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0028599 ...... PASSED
================================================================================
...
which is quite good comparing it to the pure DGEMM performance (490 GFlop/s) obtained by the DGEMM benchmark of OpenBLAS.
again check with LDD if you use fedora-supplied /usr/lib64/libopenblas(p/o).so or one you built or both. it does not go anywhere if you mix in random combinations of libraries in the picture.
@brada4 Please find the results below for update-alternatives --list command for both LD_PRELOAD & LD_LIBRARY_PATH
/usr/lib/libblas/libblas.so.3 /usr/lib/openblas-base/libblas.so.3
update-alternatives: error: no alternatives for libopenblas.so.0
@martin-frbg I have used just the make command to use default options for both 2.18 & 2.19.
@grisuthedragon Hi, can you please give it a try on Ubuntu 16.04 once?
Main idea is to avoid linking to system blas (remove -lblas option) , and use -L/where/openblas/is/built -lopenblas, and check with ldd that you actually test openblas build that you intended to test.
@rakshithprakash I do not have an Ubuntu 16.04 running on this machine. Furthermore, IBM suggest RHEL/CentOS on this type of machines.
@grisuthedragon On IBM site I find contrary statement.... This problem will not be fixed by Red Hat switch, since system default BLAS will be linked in addition to openblas.
@brada4 Removing the -lblas option doesn't seem to work for me. Please find the error below :
mpicc -L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lopenblas -L/opt/ibm/lib/ -lm -R/opt/ibm/lib -o /home/hpl-2.2/hpl-2.2/bin/ppc64/xhpl HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_idamax.o): In function HPL_idamax': HPL_idamax.c:(.text+0x38): undefined reference to
idamax_'
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dgemv.o): In function HPL_dgemv': HPL_dgemv.c:(.text+0xa8): undefined reference to
dgemv_'
HPLdgemv.c:(.text+0x12c): undefined reference to `dgemv'
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dcopy.o): In function HPL_dcopy': HPL_dcopy.c:(.text+0x3c): undefined reference to
dcopy_'
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_daxpy.o): In function HPL_daxpy': HPL_daxpy.c:(.text+0x44): undefined reference to
daxpy_'
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dscal.o): In function HPL_dscal': HPL_dscal.c:(.text+0x3c): undefined reference to
dscal_'
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dtrsv.o): In function HPL_dtrsv': HPL_dtrsv.c:(.text+0xc0): undefined reference to
dtrsv_'
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dger.o): In function HPL_dger': HPL_dger.c:(.text+0x74): undefined reference to
dger_'
HPLdger.c:(.text+0xbc): undefined reference to `dger'
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dgemm.o): In function HPL_dgemm': HPL_dgemm.c:(.text+0xd8): undefined reference to
dgemm_'
HPLdgemm.c:(.text+0x17c): undefined reference to `dgemm'
/home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dtrsm.o): In function HPL_dtrsm': HPL_dtrsm.c:(.text+0xf4): undefined reference to
dtrsm_'
HPLdtrsm.c:(.text+0x1c0): undefined reference to `dtrsm'
collect2: error: ld returned 1 exit status
Makefile:76: recipe for target 'dexe.grd' failed
make[2]: [dexe.grd] Error 1
make[2]: Leaving directory '/home/hpl-2.2/hpl-2.2/testing/ptest/ppc64'
Make.top:64: recipe for target 'build_tst' failed
make[1]: [build_tst] Error 2
make[1]: Leaving directory '/home/hpl-2.2/hpl-2.2'
Makefile:72: recipe for target 'build' failed
make: *** [build] Error 2
And my make file is :
# SHELL = /bin/sh # CD = cd CP = cp LN_S = ln -s MKDIR = mkdir RM = /bin/rm -f TOUCH = touch #
# ARCH = ppc64 #
# TOPdir = /home/hpl-2.2/hpl-2.2 INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH) # HPLlib = $(LIBdir)/libhpl.a #
# MPdir = MPinc = MPlib = #
# LAdir = LAinc = LAlib = #
#
#
#
#
#
#
#
F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle
#
# HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) #
#
#
# HPL_OPTS = #
# HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) #
#
export OMPI_CFLAGS:= CC = mpicc
CCNOOPT = $(HPL_DEFS) -m64
CCFLAGS = $(HPL_DEFS) -m64 -O3 -mcpu=power8 -mtune=power8
LINKER = mpicc
LINKFLAGS = -L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lopenblas -L/opt/ibm/lib/ -lm -R/opt/ibm/lib
ARCHIVER = ar ARFLAGS = r RANLIB = echo
I tried adding another L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lblas and it didn't work. I can get the same make file compiled by using the -lblas option in LAlib
Please put the "-lopenblas" in the LAlib list where the -lblas was - the libhpl.a depends on it and the sequence within the library list matters.
@rakshithprakash
Or if you do not have OpenBLAS in the default search path of your compiler put
-L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lopenblas
to the LAlib variable. If /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/
is the place where you have an compiled version of OpenBLAS. If the linker uses the shared library in this case, you may have to add /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/
to the LD_LIBRARY_PATH environment variable.
@brada4 If you're using the IBM XL compilers + ESSL + CUDA than the IBM support told me the other around. But no more here. ;-)
It got compiled now after adding the entire path in LAlib variable. But I do not see the path in the ldd.
ldd ./xhpl linux-vdso64.so.1 => (0x00003fff83f10000) libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00003fff83500000) libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00003fff833d0000) libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00003fff831f0000) libm.so.6 => /lib/powerpc64le-linux-gnu/libm.so.6 (0x00003fff83100000) libpthread.so.0 => /lib/powerpc64le-linux-gnu/libpthread.so.0 (0x00003fff830c0000) libgfortran.so.3 => /usr/lib/powerpc64le-linux-gnu/libgfortran.so.3 (0x00003fff82f80000) libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00003fff82f50000) libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00003fff82ea0000) libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00003fff82dd0000) /lib64/ld64.so.2 (0x00000000446b0000) libgcc_s.so.1 => /lib/powerpc64le-linux-gnu/libgcc_s.so.1 (0x00003fff82da0000) libdl.so.2 => /lib/powerpc64le-linux-gnu/libdl.so.2 (0x00003fff82d70000) libhwloc.so.5 => /usr/lib/powerpc64le-linux-gnu/libhwloc.so.5 (0x00003fff82d00000) libutil.so.1 => /lib/powerpc64le-linux-gnu/libutil.so.1 (0x00003fff82cd0000) libnuma.so.1 => /usr/lib/powerpc64le-linux-gnu/libnuma.so.1 (0x00003fff82ca0000) libltdl.so.7 => /usr/lib/powerpc64le-linux-gnu/libltdl.so.7 (0x00003fff82c70000)
But using export LD_LIBRARY_PATH I can see the output for both 2.18 & 2.19.
2.18:
WR11R2C4 2000 140 1 2 1.81 2.948e+00
2.19 :
WR11R2C4 2000 140 1 2 0.35 1.512e+01
Probably they add ESSL as alternative to libblas.so.3 and all works well by default. You could try that way with OpenBLAS too - 'make install' will install /opt/OpenBLAS/lib/libopenblas.so then run update-alternatives --install /usr/lib/libblas.so.3 libblas.so.3 /opt/OpenBLAS/lib/libopenblas.so 1 then update-alternatives --config libblas.so.3 then you can easily switch between BLAS implementations as you go forward without hard-coding any implementation.
So if I read your most recent results correctly 2.19 is now performing better than 2.18 (Gflops went from 2.948 to 15.12 for that test) ?
@brada4 I do not think that they use the ESSL as alternative for libblas.so.3 because the ESSL is designed to work with the XL compiler and therefore the Fortran symbols does not have the underscore at the end. So installing ESSL as alternative will break all applications.
Just build HPL against -lblas and update alternatives. It is the easiest way
Hi, I was running HPL-2.2 with openblas 2.19,2.20 and could see that the benchmark never exits even after running it overnight for a very small problem size such as 200. I cross verified it once on 2.18 and could see that it completes in less than a second. Please find the command that I'm using :
mpirun -np 2 -bind-to-core --allow-run-as-root --mca btl sm,self,tcp xhpl
My guest configuration is as below :
Number of cores : 2 cores SMT mode : 8 Memory : 16GB OS : Ubuntu 16.04 LTS Kernel version : 4.4.0-21-generic
I verified the same thing on x86 and could see that it is working fine.
After looking at the perf data on 2.19 I could observe that 90% of the time is spent in inner_thread
Attaching the perf data of 2.19 :
and whereas the annotations of inner_thread looks like this :
annotations.txt
I could the observe the same behavior on another Ubuntu machine as well but it works fine in RHEL.