OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.33k stars 1.49k forks source link

Openblas 2.19 and above is not working on Ubuntu 16.04 for Power 8 #1037

Closed rakshithprakash closed 6 years ago

rakshithprakash commented 7 years ago

Hi, I was running HPL-2.2 with openblas 2.19,2.20 and could see that the benchmark never exits even after running it overnight for a very small problem size such as 200. I cross verified it once on 2.18 and could see that it completes in less than a second. Please find the command that I'm using :

mpirun -np 2 -bind-to-core --allow-run-as-root --mca btl sm,self,tcp xhpl

My guest configuration is as below :

Number of cores : 2 cores SMT mode : 8 Memory : 16GB OS : Ubuntu 16.04 LTS Kernel version : 4.4.0-21-generic

I verified the same thing on x86 and could see that it is working fine.

After looking at the perf data on 2.19 I could observe that 90% of the time is spent in inner_thread

Attaching the perf data of 2.19 :

2 19_profile

and whereas the annotations of inner_thread looks like this :

annotations.txt

I could the observe the same behavior on another Ubuntu machine as well but it works fine in RHEL.

martin-frbg commented 7 years ago

Most of the POWER8-specific changes since 0.2.18 appear to have happened in a narrow timespan between April 19 and May 22 - while I cannot possibly comment on the power8 assembly, I think I understand that wernsaar also adjusted some thresholds for thread creation in the GEMM functions that may simply have made thread contention more likely, and 8310d4d3 also dropped the ALLOC_SHM define from Makefile.power without explanation (though it may have been spurious all along).

Unless somebody else comes up with a better idea, I wonder if you could try a snapshot from somewhere in the middle of what appears to have been the crucial period ( say 0551e57 from April 26 ), and/or try and see if limiting the number of threads created on each node via OMP_NUM_THREADS has any influence ?

Also do the Ubuntu and RHEL machines you mention use the exact same binary, or was OpenBLAS built separately on each (implying different compiler versions and/or options in use) ?

brada4 commented 7 years ago

Did you use any flags when compiling OpenBLAS? Also full gcc and gfortran versions are more essential than kernel version (normally one assumes kernel shipped with distribution, or patched fully or anything in between)

Attach to frozen process with debugger and dump all backtraces: "$ script "$ gdb "> atta pid "> thread apply all backtrace "..... here is the interesting output ">deta ">quit "$ quit

And attach typescript file here.

rakshithprakash commented 7 years ago

I have compiled openblas using just the make command to use default options on both Ubuntu and RHEL and used export LD_LIBRARY_PATH= and ran HPL using the following command : mpirun -np 2 -bind-to-core --allow-run-as-root --mca btl sm,self,tcp xhpl

I have used GCC and Gfortran version 5.3.1 on both Ubuntu and RHEL.

Also tried to collect the traces but I see the following error - [ No Source Available ]

Am I missing out something? Does it require to connect any debuggers? since it is a guest machine on KVM. Any other ways of collecting the traces?

martin-frbg commented 7 years ago

You may need to build a debuggable version of OpenBLAS and HPL first to get any meaningful backtraces with gdb.

brada4 commented 7 years ago

Just a quick peek into the problem - can you compare CPUID on compile machine and run machine?

brada4 commented 7 years ago

Does it work out single threaded and/or without MPI binding options? OpenBLAS by default spins up pthreads for all available CPUs, or compile-time detected CPUs, whichever smallest (set OPENBLAS_NUM_THREADS to less if needed) , maybe MPI binding/affinity somehow hurts default build that does not try to bind processors. You can start gdb in build root directory where all source files are in place (sort of)

grisuthedragon commented 7 years ago

@rakshithprakash Can you provide me your make.inc from the HPL benchmark such that I can recompile it easily ?

rakshithprakash commented 7 years ago

@brada4 Attaching the traces :

backtraces.txt

rakshithprakash commented 7 years ago

@brada4 Both the compiler and the run machine are the same in my case. And I removed the binding in the mpi command and gave a run but I'm seeing the same issue again. Please find the command that I used : mpirun --allow-run-as-root --mca btl sm,self,tcp xhpl

rakshithprakash commented 7 years ago

@grisuthedragon Please find the attached Makefile

Makefile_ppc.txt

grisuthedragon commented 7 years ago

I tested it with the current development version of OpenBLAS on an IBM Power8+ with CentOS 7.3 running on and everything works fine with the HPL benchmark.

I compiled OpenBLAS using

 make NUM_THREADS=1 USE_OPENMP=0 

because for the HPL benchmark it is quite common to use only the parallelization coming from the MPI processes.

martin-frbg commented 7 years ago

Probably related to #660, running with only one thread is bound to avoid any deadlocks from multithreading.

grisuthedragon commented 7 years ago

Even with enabled multithreading in OpenBLAS the hpl code works fine without running into a deadlock on my machine.

brada4 commented 7 years ago

I was looking for t a a bt (i.e backtrace from all threads)

Current backtrace looks like OpenMP-enabled system OpenBLAS (symlinked to /usr/lib/libblas.so.3 via update-alternatives)- are you sure HPL is linked against freshly built OpenBLAS? (check with ldd)

rakshithprakash commented 7 years ago

@brada4 Please find the attached backtrace for all threads :

backtraces_allthreads.txt

I also looked at the ldd and could see that libopenblas is linked to 2.19 version :

libopenblas.so.0 => /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0 (0x00003fffb4790000)

brada4 commented 7 years ago

I have doubts about linked imports of xhpl (program that backtrace comes from) You can try LD_PRELOAD=...../openblas.so.0 xhpl (with full path in hope to override system library) Cleaner way would be to plant alternative for libblas.so.3 to work around HPL build system mistakes.

#2  exec_blas._omp_fn.0 () at blas_server_omp.c:312
#3  0x00003fff867be8a4 in GOMP_parallel () from /usr/lib/powerpc64le-linux-gnu/libgomp.so.1
#4  0x00003fff86ecc7e4 in exec_blas (num=<optimized out>, queue=<optimized out>) at blas_server_omp.c:305
---Type <return> to continue, or q <return> to quit---
#5  0x00003fff86dfbee0 in gemm_driver (args=<optimized out>, range_m=<optimized out>, range_n=<optimized out>, sa=<optimized out>, sb=<optimized out>, mypos=0) at level3_thread.c:672
#6  0x00003fff86dfc1f4 in dgemm_thread_nt (args=<optimized out>, range_m=<optimized out>, range_n=<optimized out>, sa=<optimized out>, sb=<optimized out>, mypos=<optimized out>)
    at level3_thread.c:733
#7  0x00003fff87ba7cd0 in dgemm_ () from /usr/lib/libblas.so.3
#8  0x00000000100121f0 in HPL_dgemm ()
rakshithprakash commented 7 years ago

I did a export LD_PRELOAD=/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19 and i'm seeing the following error :

ERROR: ld.so: object '/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19' from LD_PRELOAD cannot be preloaded (cannot read file data): ignored.

And this is the ldd from export LD_LIBRARY_PATH=/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19 :

ldd ./xhpl linux-vdso64.so.1 => (0x00003fffa0ef0000) libblas.so.3 => /usr/lib/libblas.so.3 (0x00003fffa0e50000) libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00003fffa0d20000) libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00003fffa0b40000) libopenblas.so.0 => /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0 (0x00003fff9ff00000) libm.so.6 => /lib/powerpc64le-linux-gnu/libm.so.6 (0x00003fff9fe10000) libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00003fff9fde0000) libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00003fff9fd30000) libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00003fff9fc60000) libpthread.so.0 => /lib/powerpc64le-linux-gnu/libpthread.so.0 (0x00003fff9fc20000) /lib64/ld64.so.2 (0x0000000058cf0000) libgfortran.so.3 => /usr/lib/powerpc64le-linux-gnu/libgfortran.so.3 (0x00003fff9fae0000) libgomp.so.1 => /usr/lib/powerpc64le-linux-gnu/libgomp.so.1 (0x00003fff9fa90000) libdl.so.2 => /lib/powerpc64le-linux-gnu/libdl.so.2 (0x00003fff9fa60000) libhwloc.so.5 => /usr/lib/powerpc64le-linux-gnu/libhwloc.so.5 (0x00003fff9f9f0000) libutil.so.1 => /lib/powerpc64le-linux-gnu/libutil.so.1 (0x00003fff9f9c0000) libgcc_s.so.1 => /lib/powerpc64le-linux-gnu/libgcc_s.so.1 (0x00003fff9f990000) libnuma.so.1 => /usr/lib/powerpc64le-linux-gnu/libnuma.so.1 (0x00003fff9f960000) libltdl.so.7 => /usr/lib/powerpc64le-linux-gnu/libltdl.so.7 (0x00003fff9f930000)

martin-frbg commented 7 years ago

Unlike LD_LIBRARY_PATH you need to include the name of the library in the LD_PRELOAD, so: export LD_PRELOAD=/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0 (and it does look a bit strange that ldd shows separate entries for the libopenblas.so.0 and a libblas.so.3 although you mentioned that both link to the same file)

brada4 commented 7 years ago

If I look at Makefile_ppc.txt attached earlier, it uses both -lblas and -lopenblas Taking out -lblas will fix the issue of undefined result.

martin-frbg commented 7 years ago

Users of Ubuntu 16.04 on POWER8 may also want to take note of Ubuntu Bug #1641241 here: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1641241 describing a misbehaviour of the hardware lock elision code included in recent versions of glibc that is apparently specific to the POWER platform (and worked around by the update linked at the end of the page) (Found via a bug report by bhart in https://github.com/tensorflow/tensorflow/issues/5482)

rakshithprakash commented 7 years ago

I did an export LD_PRELOAD for both versions of openblas - 2.18 and 2.19. Below is the ldd for 2.19 :

ldd ./xhpl linux-vdso64.so.1 => (0x00003fff87de0000) /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/libopenblas.so.0 (0x00003fff871a0000) libblas.so.3 => /usr/lib/libblas.so.3 (0x00003fff87100000) libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00003fff86fd0000) libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00003fff86df0000) libm.so.6 => /lib/powerpc64le-linux-gnu/libm.so.6 (0x00003fff86d00000) libpthread.so.0 => /lib/powerpc64le-linux-gnu/libpthread.so.0 (0x00003fff86cc0000) libgfortran.so.3 => /usr/lib/powerpc64le-linux-gnu/libgfortran.so.3 (0x00003fff86b80000) libgomp.so.1 => /usr/lib/powerpc64le-linux-gnu/libgomp.so.1 (0x00003fff86b30000) libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00003fff86b00000) libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00003fff86a50000) libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00003fff86980000) /lib64/ld64.so.2 (0x0000000044b20000) libgcc_s.so.1 => /lib/powerpc64le-linux-gnu/libgcc_s.so.1 (0x00003fff86950000) libdl.so.2 => /lib/powerpc64le-linux-gnu/libdl.so.2 (0x00003fff86920000) libhwloc.so.5 => /usr/lib/powerpc64le-linux-gnu/libhwloc.so.5 (0x00003fff868b0000) libutil.so.1 => /lib/powerpc64le-linux-gnu/libutil.so.1 (0x00003fff86880000) libnuma.so.1 => /usr/lib/powerpc64le-linux-gnu/libnuma.so.1 (0x00003fff86850000) libltdl.so.7 => /usr/lib/powerpc64le-linux-gnu/libltdl.so.7 (0x00003fff86820000)

Both the versions 2.18 & 2.19 seem to work now after using LD_PRELOAD but I see that 2.19 is taking approximately 2.4x the time to complete when compared to 2.18. Please find the results below :

2.18 :

WR11R2C4 2000 140 1 2 2.44 2.190e+00

2.19 :

WR11R2C4 2000 140 1 2 5.98 8.922e-01

I collected the perf data and annotations for them, please find them below :

Samples: 293K of event 'cycles:ppp', Event count (approx.): 293830000000 Overhead Command Shared Object Symbol

martin-frbg commented 7 years ago

Not sure how to read the perf data (do you have the 2.18 values for comparison ?), are these results reproducible (and same workload etc on the machine during both runs) ? If the values are stable it could be that the changed thresholds mentioned above are not favorable for the matrix sizes in this particular benchmark. Perhaps @grisuthedragon has benchmark results from his machine easily available ?

rakshithprakash commented 7 years ago

@martin-frbg Here are the perf data for 2.18 for comparison :

Samples: 301K of event 'cycles:ppp', Event count (approx.): 301774000000 Overhead Command Shared Object Symbol

Yes it's reproducible.

brada4 commented 7 years ago

can you fix conflicting libblas and libopenblas dependencies? So far what I see is mistake building HPL, nothing more.

martin-frbg commented 7 years ago

Is it conceivable that you built 0.2.18 with different options, something more like the NUM_THREADS=1 USE_OPENMP=0 that grisuthedragon recommended above for hpl ? (No libgomp and no reference to threading in its perf results would explain less overhead...)

brada4 commented 7 years ago

Most likely /usr/lib/libblas.so.3 is ubuntu-supplied openblas 0.2.18 built with OPENMP, and without CBLAS or LAPACK... # update-alternatives --list Should confirm it

grisuthedragon commented 7 years ago

@martin-frbg Here is the result of my benchmark( gcc 4.8.5, glibc 2,17, CentOS 7.3, Kernel 4.8[from Fedora 25], current OpenBLAS 0.2.20dev , OpenMPI 1.8.)

I optimized the HPL.dat for my machine now having the following:

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
4            # of problems sizes (N)
10240 20480 30720 40960 30 34 35  Ns
1            # of NBs
96 32 64 96 128 160 192 224 256      NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1          # of process grids (P x Q)
4       Ps
5       Qs
16.0         threshold
1            # of panel fact
2 1 2        PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
2 4          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
2 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

and for the largest experiment (N = 40960) I get:

OMP_NUM_THREADS=1 mpirun --allow-run-as-root  -np 20 ./xhpl 
...
 ================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR00R2R2       40960    96     4     5             107.79              4.251e+02
HPL_pdgesv() start time Wed Jan  4 21:05:28 2017

HPL_pdgesv() end time   Wed Jan  4 21:07:16 2017

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0028599 ...... PASSED
================================================================================
...

which is quite good comparing it to the pure DGEMM performance (490 GFlop/s) obtained by the DGEMM benchmark of OpenBLAS.

brada4 commented 7 years ago

again check with LDD if you use fedora-supplied /usr/lib64/libopenblas(p/o).so or one you built or both. it does not go anywhere if you mix in random combinations of libraries in the picture.

rakshithprakash commented 7 years ago

@brada4 Please find the results below for update-alternatives --list command for both LD_PRELOAD & LD_LIBRARY_PATH

update-alternatives --list libblas.so.3

/usr/lib/libblas/libblas.so.3 /usr/lib/openblas-base/libblas.so.3

update-alternatives --list libopenblas.so.0

update-alternatives: error: no alternatives for libopenblas.so.0

rakshithprakash commented 7 years ago

@martin-frbg I have used just the make command to use default options for both 2.18 & 2.19.

rakshithprakash commented 7 years ago

@grisuthedragon Hi, can you please give it a try on Ubuntu 16.04 once?

brada4 commented 7 years ago

Main idea is to avoid linking to system blas (remove -lblas option) , and use -L/where/openblas/is/built -lopenblas, and check with ldd that you actually test openblas build that you intended to test.

grisuthedragon commented 7 years ago

@rakshithprakash I do not have an Ubuntu 16.04 running on this machine. Furthermore, IBM suggest RHEL/CentOS on this type of machines.

brada4 commented 7 years ago

@grisuthedragon On IBM site I find contrary statement.... This problem will not be fixed by Red Hat switch, since system default BLAS will be linked in addition to openblas.

rakshithprakash commented 7 years ago

@brada4 Removing the -lblas option doesn't seem to work for me. Please find the error below :

mpicc -L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lopenblas -L/opt/ibm/lib/ -lm -R/opt/ibm/lib -o /home/hpl-2.2/hpl-2.2/bin/ppc64/xhpl HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_idamax.o): In function HPL_idamax': HPL_idamax.c:(.text+0x38): undefined reference toidamax_' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dgemv.o): In function HPL_dgemv': HPL_dgemv.c:(.text+0xa8): undefined reference todgemv_' HPLdgemv.c:(.text+0x12c): undefined reference to `dgemv' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dcopy.o): In function HPL_dcopy': HPL_dcopy.c:(.text+0x3c): undefined reference todcopy_' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_daxpy.o): In function HPL_daxpy': HPL_daxpy.c:(.text+0x44): undefined reference todaxpy_' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dscal.o): In function HPL_dscal': HPL_dscal.c:(.text+0x3c): undefined reference todscal_' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dtrsv.o): In function HPL_dtrsv': HPL_dtrsv.c:(.text+0xc0): undefined reference todtrsv_' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dger.o): In function HPL_dger': HPL_dger.c:(.text+0x74): undefined reference todger_' HPLdger.c:(.text+0xbc): undefined reference to `dger' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dgemm.o): In function HPL_dgemm': HPL_dgemm.c:(.text+0xd8): undefined reference todgemm_' HPLdgemm.c:(.text+0x17c): undefined reference to `dgemm' /home/hpl-2.2/hpl-2.2/lib/ppc64/libhpl.a(HPL_dtrsm.o): In function HPL_dtrsm': HPL_dtrsm.c:(.text+0xf4): undefined reference todtrsm_' HPLdtrsm.c:(.text+0x1c0): undefined reference to `dtrsm' collect2: error: ld returned 1 exit status Makefile:76: recipe for target 'dexe.grd' failed make[2]: [dexe.grd] Error 1 make[2]: Leaving directory '/home/hpl-2.2/hpl-2.2/testing/ptest/ppc64' Make.top:64: recipe for target 'build_tst' failed make[1]: [build_tst] Error 2 make[1]: Leaving directory '/home/hpl-2.2/hpl-2.2' Makefile:72: recipe for target 'build' failed make: *** [build] Error 2

And my make file is :

----------------------------------------------------------------------

- shell --------------------------------------------------------------

----------------------------------------------------------------------

# SHELL = /bin/sh # CD = cd CP = cp LN_S = ln -s MKDIR = mkdir RM = /bin/rm -f TOUCH = touch #

----------------------------------------------------------------------

- Platform identifier ------------------------------------------------

----------------------------------------------------------------------

# ARCH = ppc64 #

----------------------------------------------------------------------

- HPL Directory Structure / HPL library ------------------------------

----------------------------------------------------------------------

# TOPdir = /home/hpl-2.2/hpl-2.2 INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH) # HPLlib = $(LIBdir)/libhpl.a #

----------------------------------------------------------------------

- Message Passing library (MPI) --------------------------------------

----------------------------------------------------------------------

MPinc tells the C compiler where to find the Message Passing library

header files, MPlib is defined to be the name of the library to be

used. The variable MPdir is only used for defining MPinc and MPlib.

# MPdir = MPinc = MPlib = #

----------------------------------------------------------------------

- Linear Algebra library (BLAS or VSIPL) -----------------------------

----------------------------------------------------------------------

LAinc tells the C compiler where to find the Linear Algebra library

header files, LAlib is defined to be the name of the library to be

used. The variable LAdir is only used for defining LAinc and LAlib.

# LAdir = LAinc = LAlib = #

----------------------------------------------------------------------

- F77 / C interface --------------------------------------------------

----------------------------------------------------------------------

You can skip this section if and only if you are not planning to use

a BLAS library featuring a Fortran 77 interface. Otherwise, it is

necessary to fill out the F2CDEFS variable with the appropriate

options. One and only one option should be chosen in each of

the 3 following categories:

#

1) name space (How C calls a Fortran 77 routine)

#

-DAdd_ : all lower case and a suffixed underscore (Suns,

Intel, ...), [default]

-DNoChange : all lower case (IBM RS6000),

-DUpCase : all upper case (Cray),

-DAdd__ : the FORTRAN compiler in use is f2c.

#

2) C and Fortran 77 integer mapping

#

-DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]

-DF77_INTEGER=long : Fortran 77 INTEGER is a C long,

-DF77_INTEGER=short : Fortran 77 INTEGER is a C short.

#

3) Fortran 77 string handling

#

-DStringSunStyle : The string address is passed at the string loca-

tion on the stack, and the string length is then

passed as an F77_INTEGER after all explicit

stack arguments, [default]

-DStringStructPtr : The address of a structure is passed by a

Fortran 77 string, and the structure is of the

form: struct {char *cp; F77_INTEGER len;},

-DStringStructVal : A structure is passed by value for each Fortran

77 string, and the structure is of the form:

struct {char *cp; F77_INTEGER len;},

-DStringCrayStyle : Special option for Cray machines, which uses

Cray fcd (fortran character descriptor) for

interoperation.

#

F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle

#

----------------------------------------------------------------------

- HPL includes / libraries / specifics -------------------------------

----------------------------------------------------------------------

# HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) #

- Compile time options -----------------------------------------------

#

-DHPL_COPY_L force the copy of the panel L before bcast;

-DHPL_CALL_CBLAS call the cblas interface;

-DHPL_CALL_VSIPL call the vsip library;

-DHPL_DETAILED_TIMING enable detailed timers;

#

By default HPL will:

*) not copy L before broadcast,

*) call the BLAS Fortran 77 interface,

*) not display detailed timing information.

# HPL_OPTS = #

----------------------------------------------------------------------

# HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) #

----------------------------------------------------------------------

- Compilers / linkers - Optimization flags ---------------------------

----------------------------------------------------------------------

#

export OMPI_CFLAGS:= CC = mpicc

CCNOOPT = $(HPL_DEFS) -m64

CCFLAGS = $(HPL_DEFS) -m64 -O3 -mcpu=power8 -mtune=power8

LINKER = mpicc

LINKFLAGS = -L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lopenblas -L/opt/ibm/lib/ -lm -R/opt/ibm/lib

ARCHIVER = ar ARFLAGS = r RANLIB = echo

I tried adding another L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lblas and it didn't work. I can get the same make file compiled by using the -lblas option in LAlib

martin-frbg commented 7 years ago

Please put the "-lopenblas" in the LAlib list where the -lblas was - the libhpl.a depends on it and the sequence within the library list matters.

grisuthedragon commented 7 years ago

@rakshithprakash Or if you do not have OpenBLAS in the default search path of your compiler put -L/home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ -lopenblas to the LAlib variable. If /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ is the place where you have an compiled version of OpenBLAS. If the linker uses the shared library in this case, you may have to add /home/hpl-2.2/openblas-2.19/OpenBLAS-0.2.19/ to the LD_LIBRARY_PATH environment variable.

@brada4 If you're using the IBM XL compilers + ESSL + CUDA than the IBM support told me the other around. But no more here. ;-)

rakshithprakash commented 7 years ago

It got compiled now after adding the entire path in LAlib variable. But I do not see the path in the ldd.

ldd ./xhpl linux-vdso64.so.1 => (0x00003fff83f10000) libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00003fff83500000) libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00003fff833d0000) libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00003fff831f0000) libm.so.6 => /lib/powerpc64le-linux-gnu/libm.so.6 (0x00003fff83100000) libpthread.so.0 => /lib/powerpc64le-linux-gnu/libpthread.so.0 (0x00003fff830c0000) libgfortran.so.3 => /usr/lib/powerpc64le-linux-gnu/libgfortran.so.3 (0x00003fff82f80000) libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00003fff82f50000) libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00003fff82ea0000) libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00003fff82dd0000) /lib64/ld64.so.2 (0x00000000446b0000) libgcc_s.so.1 => /lib/powerpc64le-linux-gnu/libgcc_s.so.1 (0x00003fff82da0000) libdl.so.2 => /lib/powerpc64le-linux-gnu/libdl.so.2 (0x00003fff82d70000) libhwloc.so.5 => /usr/lib/powerpc64le-linux-gnu/libhwloc.so.5 (0x00003fff82d00000) libutil.so.1 => /lib/powerpc64le-linux-gnu/libutil.so.1 (0x00003fff82cd0000) libnuma.so.1 => /usr/lib/powerpc64le-linux-gnu/libnuma.so.1 (0x00003fff82ca0000) libltdl.so.7 => /usr/lib/powerpc64le-linux-gnu/libltdl.so.7 (0x00003fff82c70000)

But using export LD_LIBRARY_PATH I can see the output for both 2.18 & 2.19.

2.18:

================================================================================ T/V N NB P Q Time Gflops

WR11R2C4 2000 140 1 2 1.81 2.948e+00

2.19 :

================================================================================ T/V N NB P Q Time Gflops

WR11R2C4 2000 140 1 2 0.35 1.512e+01

brada4 commented 7 years ago

Probably they add ESSL as alternative to libblas.so.3 and all works well by default. You could try that way with OpenBLAS too - 'make install' will install /opt/OpenBLAS/lib/libopenblas.so then run update-alternatives --install /usr/lib/libblas.so.3 libblas.so.3 /opt/OpenBLAS/lib/libopenblas.so 1 then update-alternatives --config libblas.so.3 then you can easily switch between BLAS implementations as you go forward without hard-coding any implementation.

martin-frbg commented 7 years ago

So if I read your most recent results correctly 2.19 is now performing better than 2.18 (Gflops went from 2.948 to 15.12 for that test) ?

grisuthedragon commented 7 years ago

@brada4 I do not think that they use the ESSL as alternative for libblas.so.3 because the ESSL is designed to work with the XL compiler and therefore the Fortran symbols does not have the underscore at the end. So installing ESSL as alternative will break all applications.

brada4 commented 7 years ago

Just build HPL against -lblas and update alternatives. It is the easiest way