NOAA-EMC / WW3

WAVEWATCH III
Other
254 stars 501 forks source link

Pthreads + SCOTCH #891

Open JessicaMeixner-NOAA opened 1 year ago

JessicaMeixner-NOAA commented 1 year ago

When building SCOTCH with Pthreads on orion, the model will build but then WW3 will fail (see details here: #885 ). Turning off pthreads solves this issue and moving forward for now, we're not using pthreads, but this issue is to keep track of this problem and to eventually return to see if we can turn pthreads back on.

There's not expectation that this will be resolved soon and this issue is just made for tracking purposes.

aronroland commented 1 year ago

Hi All,

there is a new SCOTCH version, who will update it? I think on the initial commit to develop we should use the latest.

Cheers

Aron

Von: Jessica Meixner @.> Gesendet: Dienstag, 14. Februar 2023 23:39 An: NOAA-EMC/WW3 @.> Cc: Subscribed @.***> Betreff: [NOAA-EMC/WW3] Pthreads + SCOTCH (Issue #891)

When building SCOTCH with Pthreads on orion, the model will build but then WW3 will fail (see details here: #885 https://github.com/NOAA-EMC/WW3/issues/885 ). Turning off pthreads solves this issue and moving forward for now, we're not using pthreads, but this issue is to keep track of this problem and to eventually return to see if we can turn pthreads back on.

There's not expectation that this will be resolved soon and this issue is just made for tracking purposes.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/WW3/issues/891 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2S43SRFADOQ3CLQRZBKRTWXQCPHANCNFSM6AAAAAAU4EFENY . You are receiving this because you are subscribed to this thread. https://github.com/notifications/beacon/AB2S43V5YD7TOE3JKUON6BTWXQCPHA5CNFSM6AAAAAAU4EFEN2WGG33NNVSW45C7OR4XAZNFJFZXG5LFVJRW63LNMVXHIX3JMTHF46BF5U.gif Message ID: @. @.> >

aronroland commented 1 year ago

Hi @JessicaMeixner-NOAA, @MatthewMasarik-NOAA

it came to my mind, that there is a possibility that your environmental settings of the HPCF different in a certain way, did you compared ulimit -a on the both machines and there are quite more settings linked to mpi and maybe even the threading behavior. Maybe, u can adjust with @thesser1 ...

MatthewMasarik-NOAA commented 1 year ago

Hi @aronroland,

it came to my mind, that there is a possibility that your environmental settings of the HPCF different in a certain way, did you compared ulimit -a on the both machines and there are quite more settings linked to mpi and maybe even the threading behavior. Maybe, u can adjust with @thesser1 ...

I definitely agree. @JessicaMeixner-NOAA and I have previously tagged @aliabdolali and @thesser1 regarding their job card settings. Ali showed me ccmake which I'm currently using to verify the cmake flags passed. Any environment settings you guys found useful would be great help to compare against our job cards.

thesser1 commented 1 year ago

I am running with the following ulimit -a

core file size (blocks, -c) unlimited

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 513915

max locked memory (kbytes, -l) unlimited

max memory size (kbytes, -m) unlimited

open files (-n) 16384

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) unlimited

cpu time (seconds, -t) unlimited

max user processes (-u) 4096

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

On Fri, Feb 17, 2023 at 8:11 AM Matthew Masarik @.***> wrote:

Hi @aronroland https://github.com/aronroland,

it came to my mind, that there is a possibility that your environmental settings of the HPCF different in a certain way, did you compared ulimit -a on the both machines and there are quite more settings linked to mpi and maybe even the threading behavior. Maybe, u can adjust with @thesser1 https://github.com/thesser1 ...

I definitely agree. @JessicaMeixner-NOAA https://github.com/JessicaMeixner-NOAA and I have previously tagged @aliabdolali https://github.com/aliabdolali and @thesser1 https://github.com/thesser1 regarding their job card settings. Ali showed me ccmake which I'm currently using to verify the cmake flags passed. Any environment settings you guys found useful would be great help to compare against our job cards.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/WW3/issues/891#issuecomment-1434632638, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU2O3ETAAKKWNHHP75T3GLWX52HTANCNFSM6AAAAAAU4EFENY . You are receiving this because you were mentioned.Message ID: @.***>

MatthewMasarik-NOAA commented 1 year ago

Thank you @thesser1! This is very helpful. The ulimit -a you and @aronroland mentioned is new to me, I'll include that in our job card.

here is the ccmake output for a SCOTCH build image

Are you guys setting INTSIZE?

thesser1 commented 1 year ago

BUILD_LIBESMUMPS ON

BUILD_LIBSCOTCHMETIS ON

BUILD_PTSCOTCH ON

CMAKE_BUILD_TYPE Release

CMAKE_INSTALL_PREFIX /p/work2/thesser1/code_management/tools/scotch_test/install/scotch-v7.0.3

INCLUDE_INSTALL_DIR include/

INSTALL_METIS_HEADERS ON

INTSIZE

LIBRARY_INSTALL_DIR lib/

MPI_THREAD_MULTIPLE ON

THREADS ON

USE_BZ2 ON

USE_LZMA ON

USE_ZLIB ON

On Fri, Feb 17, 2023 at 8:56 AM Matthew Masarik @.***> wrote:

Thanks you @thesser1! This is very helpful. The ulimit -a you and @aronroland mentioned is new to me, I'll include that in our job card.

here is the ccmake output for a SCOTCH build

Are you guys setting INTSIZE?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

thesser1 commented 1 year ago

but it looks like it looks like threads are on with mine. This is the build straight from the noaa scotch build shell from the other day.

On Fri, Feb 17, 2023 at 8:58 AM Ty Hesser @.***> wrote:

BUILD_LIBESMUMPS ON

BUILD_LIBSCOTCHMETIS ON

BUILD_PTSCOTCH ON

CMAKE_BUILD_TYPE Release

CMAKE_INSTALL_PREFIX /p/work2/thesser1/code_management/tools/scotch_test/install/scotch-v7.0.3

INCLUDE_INSTALL_DIR include/

INSTALL_METIS_HEADERS ON

INTSIZE

LIBRARY_INSTALL_DIR lib/

MPI_THREAD_MULTIPLE ON

THREADS ON

USE_BZ2 ON

USE_LZMA ON

USE_ZLIB ON

On Fri, Feb 17, 2023 at 8:56 AM Matthew Masarik @.***> wrote:

Thanks you @thesser1! This is very helpful. The ulimit -a you and @aronroland mentioned is new to me, I'll include that in our job card.

here is the ccmake output for a SCOTCH build

Are you guys setting INTSIZE?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

MatthewMasarik-NOAA commented 1 year ago

yes, that's very interesting. i thought your threads would be off. I found supplying those flags to the cmake call was not actually overriding the settings in the scotch/CMakeLists.txt, and I had to edit that file to get the threads to turn off. I've got a lot to follow up on. greatly appreciate it, Ty

aliabdolali commented 1 year ago

from scotch manual, might be helpful:

 to create distributed graphs in parallel. Since this task involves
    concurrent MPI communications, the MPI library must support the
    MPI_THREAD_MULTIPLE level. In order to take advantage of these
    features, the "-DSCOTCH_PTHREAD_MPI" flag must be set, in addition
    to the "-DSCOTCH_PTHREAD" flag. These two flags are completely
    independent from the "-DCOMMON_PTHREAD_FILE" flag, which can be
    set independently from the others.

    Note that if you compile Scotch with the "-DSCOTCH_PTHREAD_MPI"
    flag, you will have to initialize your communication subsystem by
    using the MPI_Init_thread() MPI call instead of MPI_Init(), and
    the provided thread support level value returned by the routine
    must be checked carefully to assert it is indeed
    MPI_THREAD_MULTIPLE.

    Note also that since PT-Scotch calls Scotch routines when
    operating on a single process, setting "-DSCOTCH_PTHREAD" but not
    "-DSCOTCH_PTHREAD_MPI" will still allow multiple threads to be
    used on each MPI process, without interfering with MPI itself. In
    this case, the MPI thread level MPI_THREAD_FUNNELED will be
    sufficient.

The compilation flags used to manage threads are the following:

  - "-DSCOTCH_PTHREAD" is mandatory to enable multi-threaded
    algorithms in Scotch and/or PT-Scotch. It has to be used in
    conjunction with the "-DCOMMON_PTHREAD" flag that enables thread
    management at the lower levels of the Scotch implementation.

  - "-DSCOTCH_PTHREAD_MPI" enables some algorithms of PT-Scotch that
    may make concurrent calls to the MPI communication subsystem. It
    has to be used in conjunction with the "-DCOMMON_PTHREAD" flag
    (hence also with the "-DCOMMON_PTHREAD" flag). Alternately, the
    compilation flag "-DSCOTCH_MPI_ASYNC_COLL" can be used to replace
    threaded synchronous communication routines by non-threaded
    asynchronous communication routines.
MatthewMasarik-NOAA commented 1 year ago

Thanks @aliabdolali, i've have been following the ptscotch user manual closely

MatthewMasarik-NOAA commented 1 year ago

@thesser1, @aliabdolali, or @aronroland are you guys compiling with Intel? if so, do you know if its Intel64?

thesser1 commented 1 year ago

Yes it is intel, and yes it is intel64

On Fri, Feb 17, 2023 at 12:49 PM Matthew Masarik @.***> wrote:

@thesser1 https://github.com/thesser1, @aliabdolali https://github.com/aliabdolali, or @aronroland https://github.com/aronroland are you guys compiling with Intel? if so, do you know if its Intel64?

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/WW3/issues/891#issuecomment-1435025207, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU2O3EAA5WJUYAYHNI5JQDWX622FANCNFSM6AAAAAAU4EFENY . You are receiving this because you were mentioned.Message ID: @.***>

MatthewMasarik-NOAA commented 1 year ago

okay. thanks Ty