HomerReid / scuff-em

A comprehensive and full-featured computational physics suite for boundary-element analysis of electromagnetic scattering, fluctuation-induced phenomena (Casimir forces and radiative heat transfer), nanophotonics, RF device engineering, electrostatics, and more. Includes a core library with C++ and python APIs as well as many command-line applications.
http://www.homerreid.com/scuff-em
GNU General Public License v2.0
125 stars 50 forks source link

Parallel computing by SCUFF #156

Open wmkwmkwmk opened 6 years ago

wmkwmkwmk commented 6 years ago

Hi Homer,

Long time no see! How are you?

I have a question about mpirun of SCUFF. I am not so sure whether I turn on the parallel computation option of SCUFF and how to set it.

My server is a 48 core computer. However, when I run a program on it, seems the server dose not arrange the full computation power to me, although only I am using it. (The CPU occupancy flips between 4800% and 100%, but most of time is 100%).

Here are my questions: -Is it normal for a scuff-cas3D program? -Is there any command or setting to request full computational power for a single program?

BTW, I already set the computation environment to mpi mode by write following sentence into ".bashrs" file(get from our technician).

`source /opt/intel/bin/compilervars.sh intel64 source $MKLROOT/bin/mklvars.sh intel64 export PATH=/usr/local/bin:/opt/mpich/bin:$PATH

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openblas/lib:opt/mpich/lib:/opt/hdf5/lib

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/mpich/lib:/opt/hdf5/lib`

I can run "mpicxx, mpifort, mpicc, mpiexec" in the terminal.

I am really beginner of linux, hope that I express my questions clearly. Thanks you!

Regards, Mingkang

HomerReid commented 6 years ago

The most likely culprit here is that you have successfully compiled SCUFF-EM with support for multithreading, but you have linked the executable codes with a non-multithreaded implementation of the Basic Linear Algebra Subroutines (BLAS) library.

You can test that this is the case by monitoring the .log file produced by whatever code you are running (i.e. scuff-neq.log, scuff-scatter.log, etc.) and comparing what the code says it is doing against the number of CPU cores being used on your system. If you observe 4800% CPU usage when SCUFF-EM says something like "Assembling BEM matrix..." but then this drops down to just 100% for linear-algebra operations (i.e. "LU-Factorizing," "LU-Solving," etc.) then that means you have a multithreaded SCUFF but a non-multithreaded BLAS.

Another way to investigate this is to type (at the Linux command prompt)

%ldd scuff-neq

This will print out all the shared libraries that scuff-neq is linked against. You want to see a reference to a multithreaded BLAS library. For example, when I do this I get:

hikari /home/homer % ldd scuff-neq
    linux-vdso.so.1 =>  (0x00007ffe9cbd0000)
    ....
    ....
    libopenblas.so.0 => /home/homer/codes/lib/libopenblas.so.0 (0x00007f7e06dd3000)
    ....
    ....

where .... stands for other output that I have omitted here. This shows me that scuff-neq is linked against the version of openblas that I built and installed on my filesystem, which is what I want. What I would not want to see is something like

hikari /home/homer % ldd scuff-neq
    linux-vdso.so.1 =>  (0x00007ffe9cbd0000)
    ....
    ....
    liblas.so.3 => /usr/lib/libblas.so.3
    ....
    ....

This would indicate that scuff-neq is linked against a default system BLAS library that is most likely not multithreaded.

I see that you have already made an effort to build SCUFF with a multithreaded BLAS, by (a) sourceing the Intel MKL setup script, and also (b) including /opt/openblas/lib in your LD_LIBRARY_PATH.

These are good steps! However, they are probably not sufficient on their own, and moreover they are not consistent with each other---the MKL linear-algebra library is distinct from OpenBLAS, so you will want to choose just one or the other and stick with that.

If you have the MKL installed on your system, that is probably the best choice. Here's a configure script that has worked for me in the past to set up SCUFF-EM for compilation with the Intel compilers and linking against the Intel math and linear-algebra libraries:

#!/bin/bash

. mklvars.sh intel64 mod lp64
. mpivars.sh

# path to Intel compilers/math libraries installation folder
export INTELROOT=$HOME/codes/intel

export PATH=${INTELROOT}/bin:${PATH}

export CPPFLAGS="-I/usr/include/mpi"
export CFLAGS="-O3 -mkl=parallel"
export CCFLAGS="-O3 -mkl=parallel"
export CXXFLAGS="-O3 -mkl=parallel"
export CC=icc
export CXX=icpc

configure --prefix=${HOME}/scuff-em-installation 

I would suggest going back to your SCUFF source directory, running make distclean, then execute the above script followed by make -j 48; make install. Then repeat the above diagnostics to check that SCUFF is properly linked against a multithreaded BLAS.

Feel free to post here any error messages or other output you may get, and feel free to keep the issue open by asking further questions. If the issue has been resolved to your satisfaction, please close it.

IntelligentElectric commented 6 years ago

Hi Reid,

can scuffem do MPI on a cluster. I mean cross the nodes?

anyone successfully did it. I try it but it doesn't work.

xygao96 commented 4 years ago

Hi Reid,

can scuffem do MPI on a cluster. I mean cross the nodes?

anyone successfully did it. I try it but it doesn't work.

Hi,

Have you successfully done it? I am also trying to use scuffem on cluster. But I do not know if it will work with different nodes.

Xingyu