AmpereComputing / HPL-on-Ampere-Altra

Apache License 2.0
8 stars 3 forks source link

Missing information in README / getting this to run following instructions #3

Closed geerlingguy closed 1 year ago

geerlingguy commented 1 year ago

One thing I had to dig to find is running sudo ldconfig prior to the step "Ensure successful installation of openmpi by executing the following commands." in the README.

Edit: I also had to run sudo apt install -y gfortran before compiling MPI. Maybe we could consider just throwing in instructions assuming default paths? E.g. wget [mpi download URL], then ./configure, make -j 4 all, then make install, then sudo ldconfig?

After that, when trying to make HPL with the provided Makefile, I got the error:

mpifort -DAdd_ -DF77_INTEGER=int -DStringSunStyle  -I/opt/hpl-2.3/include -I/opt/hpl-2.3/include/Altramax_oracleblis -I/opt/MyBlisDir/include/altramax  -fomit-frame-pointer -O3 -funroll-loops -W -Wall -o /opt/hpl-2.3/bin/Altramax_oracleblis/xhpl HPL_pddriver.o         HPL_pdinfo.o           HPL_pdtest.o /opt/hpl-2.3/lib/Altramax_oracleblis/libhpl.a -L/opt/MyBlisDir/lib/altramax -lblis 
--------------------------------------------------------------------------
No underlying compiler was specified in the wrapper compiler data file
(e.g., mpicc-wrapper-data.txt)
--------------------------------------------------------------------------
make[6]: *** [Makefile:76: dexe.grd] Error 1
make[6]: Leaving directory '/opt/hpl-2.3/testing/ptest/Altramax_oracleblis'
make[5]: *** [Make.top:68: build_tst] Error 2
make[5]: Leaving directory '/opt/hpl-2.3'
make[4]: *** [Makefile:73: build] Error 2
make[4]: Leaving directory '/opt/hpl-2.3'
make[3]: *** [Make.top:54: build_src] Error 2
make[3]: Leaving directory '/opt/hpl-2.3'
make[2]: *** [Makefile:72: build] Error 2
make[2]: Leaving directory '/opt/hpl-2.3'
make[1]: *** [Make.top:54: build_src] Error 2
make[1]: Leaving directory '/opt/hpl-2.3'
make: *** [Makefile:72: build] Error 2
geerlingguy commented 1 year ago

Ah, that could be from the fortran compiler not working:

root@adlink-ampere:/opt/hpl-2.3# mpifort --version
--------------------------------------------------------------------------
No underlying compiler was specified in the wrapper compiler data file
(e.g., mpicc-wrapper-data.txt)
--------------------------------------------------------------------------

I installed gfortran with sudo apt install -y gfortran, but I'm still getting that error.

geerlingguy commented 1 year ago

I had to install gfortran, then also recompile mpi, and now that is working. Adding that to the suggestions in the original comment.

geerlingguy commented 1 year ago

Now I'm running into:

./xhpl: error while loading shared libraries: libblis.so.4: cannot open shared object file: No such file or directory

The file seems to be there...

# ls /opt/MyBlisDir/lib/altramax
libblis.a  libblis.so  libblis.so.4

And I can confirm I compiled with make arch=Altramax_oracleblis -j

amperelu commented 1 year ago

I am not very familiar with blis, but in general, I will use LD_LIBRARY_PATH to guide the searching of .so, especially when developing applications.

export LD_LIBRARY_PATH=/opt/MyBlisDir/lib/altramax;$LD_LIBRARY_PATH

dneary commented 1 year ago

I have not used Fortran since I was in college, and have not tried compiling HPL. Maybe @kokrysa knows more about the MPL stuff, as it related to AI?

rbapat-ampere commented 1 year ago

@geerlingguy . Thanks for the recommendations.

As for the problem with blis, try :

export LD_LIBRARY_PATH=/usr/local/lib:/opt/MyBlisDir/lib/altramax:$LD_LIBRARY_PATH

One way to check if all the libraries that the binary needs are loaded is to do a ldd. For eg.

$ ldd xhpl
        linux-vdso.so.1 (0x0000ffffb21af000)
        libblis.so.4 => not found
        libmpi.so.40 => /usr/local/lib/libmpi.so.40 (0x0000ffffb1fc0000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffb1e10000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffffb2176000)
        libopen-rte.so.40 => /usr/local/lib/libopen-rte.so.40 (0x0000ffffb1d40000)
        libopen-pal.so.40 => /usr/local/lib/libopen-pal.so.40 (0x0000ffffb1c20000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffb1b80000)

As seen from the above snippet. It did not find "libblis.so.4" , which I believe is the issue that you are facing. So now if you execute the above export command

export LD_LIBRARY_PATH=/usr/local/lib:/opt/MyBlisDir/lib/altramax

you should see something like this.

ldd xhpl
        linux-vdso.so.1 (0x0000ffffb9706000)
        libblis.so.4 => /opt/MyBlisDir/lib/altramax/libblis.so.4 (0x0000ffffb94d0000)
        libmpi.so.40 => /usr/local/lib/libmpi.so.40 (0x0000ffffb9380000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffb91b0000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffffb96cd000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffb9110000)
        libgomp.so.1 => /lib/aarch64-linux-gnu/libgomp.so.1 (0x0000ffffb90b0000)
        libopen-rte.so.40 => /usr/local/lib/libopen-rte.so.40 (0x0000ffffb8fe0000)
        libopen-pal.so.40 => /usr/local/lib/libopen-pal.so.40 (0x0000ffffb8ec0000)

Hope this helps.

geerlingguy commented 1 year ago

@rbapat-ampere - Indeed that was it! I am getting 985 Gflops now at 270W power consumption.