GlobalArrays / ga

Partitioned Global Address Space (PGAS) library for distributed arrays
http://hpc.pnl.gov/globalarrays/
Other
97 stars 38 forks source link

check space left on the /dev/shm filesystem #256

Closed edoapra closed 2 years ago

edoapra commented 2 years ago

Follow-up on issue #254.
Not sure if this is the right way to spot problems on /dev/shm space
It works only only linux

edoapra commented 2 years ago

Sample output when trying to create a single GA allocation beyond the size of /dev/shm

[edo@deception02 build_edotests]$ srun -n1 df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           504G   12K  504G   1% /dev/shm
[edo@deception02 build_edotests]$ mpirun -np 2 global/testing/testc.x 69084377926
[0] nodesize 2 init /dev/shm size 515864  bsize 4096  nodesize 2 
[0] /dev/shm filesize 0 filesize*np 0 initial devshm space 515864 current /dev/shm space 515864 
[0] /dev/shm filesize 0 space left 515864 
[0] /dev/shm filesize 0 filesize*np 0 initial devshm space 515864 current /dev/shm space 515864 
[0] /dev/shm filesize 0 space left 515864 
Max Integer size = 9223372036854775807
Allocating GA ...
[0] /dev/shm filesize 527071 filesize*np 527071 initial devshm space 515864 current /dev/shm space 515864 
[0] /dev/shm fs has size 515864 new shm area has size 527071 need to increase /dev/shm by 11206 Mbytes
check_devshm: /dev/shm out of space: Success
[0] Received an Error in Communication: (-1) check_devshm: /dev/shm out of space
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------