flatironinstitute / FMM3D

Flatiron Institute Fast Multipole Libraries --- This codebase is a set of libraries to compute N-body interactions governed by the Laplace and Helmholtz equations, to a specified precision, in three dimensions, on a multi-core shared-memory machine.
https://fmm3d.readthedocs.io
Other
91 stars 36 forks source link

MATLAB code crashes with larger number of densities #15

Open bobbielf2 opened 3 years ago

bobbielf2 commented 3 years ago

Hi,

I run the code below in MATLAB and it crashes. The crash happens reliably whenever the number of densities nd is big enough. Also, it only happens if I compile the code with OpenMP, the single-threaded version works fine.

OS: macOS Catalina 10.15.6 MATLAB version: R2019b or R2020b FMM3D library compiled with the default make.inc.macos.gnu options

Additionally, I also tried similar things on a Ubuntu system and modified the OMP_STACKSIZE variable as described here. This somewhat alleviates the issue, but still crashes when nd is big enough. (However, I am not too familiar with Linux, so I could have done something wrong here.)

ns = 4000;

nd = 200;
srcinfo.nd = nd;

pg = 1;

srcinfo.sources = rand(3,ns);
srcinfo.charges = rand(nd,ns);

eps = 1e-5;

U = lfmm3d(eps,srcinfo,pg);
ahbarnett commented 3 years ago

Just some more data. I find the same on MATLAB R2017a on ubuntu 16.04, i7, gcc9. omp, 8 threads. It's not the total data size since ns=1e5; nd=10; doesn't crash, neither does ns=1e5; nd=65; even though it uses 16 GB or so. ns=1e4; nd=1e2 crashes. It's not a strict limit on nd either, since ns=1e2; nd=1e4; is crashless.

maxNumCompThreads(1) prevents all such crashes.

zgimbutas commented 3 years ago

I can replicate the bug by running fortran test driver test_lfmm3d_vec.f. Most likely OMP_STACKSIZE is set incorrectly. On my machine (ubuntu 20.04LTS, gcc 9.3.0), I have the following system defaults:

export OMP_DISPLAY_ENV=TRUE

$ make -f test_lfmm3d_vec.make -j8 gfortran -fPIC -O3 -funroll-loops -march=native -fopenmp -std=legacy -o int2-lfmm3d-vec test_lfmm3d_vec.o ../../src/Common/hkrand.o ../../src/Common/dlaran.o ../../src/Common/prini.o ../../src/Common/rotgen.o ../../src/Common/legeexps.o ../../src/Common/rotviarecur.o ../../src/Common/yrecursion.o ../../src/Laplace/l3dterms.o ../../src/Laplace/l3dtrans.o ../../src/Laplace/laprouts3d.o ../../src/Laplace/lapkernels.o ../../src/Laplace/lfmm3d.o ../../src/Laplace/lfmm3dwrap_vec.o ../../src/Laplace/lwtsexp_sep1.o ../../src/Laplace/lwtsexp_sep2.o ../../src/Laplace/lpwrouts.o ../../src/Laplace/lndiv.o ../../src/Common/rotproj.o ../../src/Common/tree_lr_3d.o ../../src/Common/dfft.o ../../src/Common/fmmcommon.o ./int2-lfmm3d-vec

OPENMP DISPLAY ENVIRONMENT BEGIN _OPENMP = '201511' OMP_DYNAMIC = 'FALSE' OMP_NESTED = 'FALSE' OMP_NUM_THREADS = '8' OMP_SCHEDULE = 'DYNAMIC' OMP_PROC_BIND = 'FALSE' OMP_PLACES = '' OMP_STACKSIZE = '0' OMP_WAIT_POLICY = 'PASSIVE' OMP_THREAD_LIMIT = '4294967295' OMP_MAX_ACTIVE_LEVELS = '2147483647' OMP_CANCELLATION = 'FALSE' OMP_DEFAULT_DEVICE = '0' OMP_MAX_TASK_PRIORITY = '0' OMP_DISPLAY_AFFINITY = 'FALSE' OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A' OPENMP DISPLAY ENVIRONMENT END

Testing suite for lfmm3d_vec Requested precision = 0.5000E-09 testing source to source interaction: charges output: potentials

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

0 0x154e708c9d01 in ???

1 0x154e708c8ed5 in ???

2 0x154e7052520f in ???

3 0x5619dc1ea098 in ???

4 0xffffffffffffffff in ???

make: *** [test_lfmm3d_vec.make:54: all] Segmentation fault (core dumped)

On Sat, Nov 14, 2020 at 8:42 PM Alex Barnett notifications@github.com wrote:

Just some more data. I find the same on MATLAB R2017a on ubuntu 16.04, i7, gcc9. omp, 8 threads. It's not the total data size since ns=1e5; nd=10; doesn't crash, neither does ns=1e5; nd=65; even though it uses 16 GB or so. ns=1e4; nd=1e2 crashes. It's not a strict limit on nd either, since ns=1e2; nd=1e4; is crashless.

maxNumCompThreads(1) prevents all such crashes.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/FMM3D/issues/15#issuecomment-727384619, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWJ3GXTJHYJVK6YF37BLMTSP5E3NANCNFSM4TVY6QJA .

ahbarnett commented 3 years ago

Hi Zydrunas - good to hear from you!

Ok, but how come FMM3D puts much on the stack? Surely any variable-sized arrays go on the heap, via malloc etc?

FWIW matlab (libiomp5) doesn't like env var changing stacksize:

OMP: Warning #181: OMP_STACKSIZE: ignored because KMP_STACKSIZE has been defined

ulimit -s unlimited didn't stop the matlab segfault.

Ok, off to bed now, Alex

zgimbutas commented 3 years ago

Thanks, Alex!

Defaults for OMP and KMP (that’s intel) stack size is something like 4 Mbytes for each thread.

That is not much, all private variables and all local variables in called subroutines (all of them combined!) go into the private thread stacks. KMP overrides OMP.

google: omp stack size

https://software.intel.com/content/www/us/en/develop/articles/openmp-stacksize-common-error.html https://software.intel.com/content/www/us/en/develop/articles/openmp-stacksize-common-error.html https://stackoverflow.com/questions/13264274/why-segmentation-fault-is-happening-in-this-openmp-code

google: omp_stacksize and kmp_stacksize

http://www.bgu.ac.il/intel_fortran_docs/compiler_f/main_for/mergedProjects/optaps_for/common/optaps_par_var.htm https://www.mathworks.com/matlabcentral/answers/447978-omp-warning-181-omp_stacksize-ignored-because-kmp_stacksize-has-been-defined

On a Mac, you may also want to experiment a bit with compiler flags for single threaded stack as well.

Zydrunas

On Nov 14, 2020, at 10:42 PM, Alex Barnett notifications@github.com wrote:

Hi Zydrunas - good to hear from you!

Ok, but how come FMM3D puts much on the stack? Surely any variable-sized arrays go on the heap, via malloc etc?

FWIW matlab (libiomp5) doesn't like env var changing stacksize:

OMP: Warning #181: OMP_STACKSIZE: ignored because KMP_STACKSIZE has been defined

ulimit -s unlimited didn't stop the matlab segfault.

Ok, off to bed now, Alex

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/FMM3D/issues/15#issuecomment-727519216, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWJ3GSARF47BDLP6QGD2CDSP5S5HANCNFSM4TVY6QJA.

zgimbutas commented 3 years ago

One can also try to track down big local variables that might be causing this (temporary vectorized multipole expansions?)

On Nov 14, 2020, at 11:00 PM, Zydrunas Gimbutas zydrunas.gimbutas@gmail.com wrote:

Thanks, Alex!

Defaults for OMP and KMP (that’s intel) stack size is something like 4 Mbytes for each thread.

That is not much, all private variables and all local variables in called subroutines (all of them combined!) go into the private thread stacks. KMP overrides OMP.

google: omp stack size

https://software.intel.com/content/www/us/en/develop/articles/openmp-stacksize-common-error.html https://stackoverflow.com/questions/13264274/why-segmentation-fault-is-happening-in-this-openmp-code

google: omp_stacksize and kmp_stacksize

http://www.bgu.ac.il/intel_fortran_docs/compiler_f/main_for/mergedProjects/optaps_for/common/optaps_par_var.htm https://www.mathworks.com/matlabcentral/answers/447978-omp-warning-181-omp_stacksize-ignored-because-kmp_stacksize-has-been-defined

On a Mac, you may also want to experiment a bit with compiler flags for single threaded stack as well.

Zydrunas

On Nov 14, 2020, at 10:42 PM, Alex Barnett notifications@github.com wrote:

Hi Zydrunas - good to hear from you!

Ok, but how come FMM3D puts much on the stack? Surely any variable-sized arrays go on the heap, via malloc etc?

FWIW matlab (libiomp5) doesn't like env var changing stacksize:

OMP: Warning #181: OMP_STACKSIZE: ignored because KMP_STACKSIZE has been defined

ulimit -s unlimited didn't stop the matlab segfault.

Ok, off to bed now, Alex

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mrachh commented 3 years ago

Thanks for bringing this to our notice. Thanks Alex and Zydrunas for the additional data and suggestions too.

I think I know the cause of the issue. It is in local memory allocation in list4 processing where the arrays are allocated inside an openmp loop rather than outside. This might also explain some issues we are having with compiling with intel compilers on windows.

Hopefully that should fix the bug..

ahbarnett commented 7 months ago

@mrachh is this fixed - can it be closed?