flatironinstitute / FMM3D

Flatiron Institute Fast Multipole Libraries --- This codebase is a set of libraries to compute N-body interactions governed by the Laplace and Helmholtz equations, to a specified precision, in three dimensions, on a multi-core shared-memory machine.
https://fmm3d.readthedocs.io
Other
91 stars 36 forks source link

Fortran kills the python process #10

Closed helange23 closed 3 years ago

helange23 commented 4 years ago

I am trying to use the code in a larger system and the problem I am experiencing is that in case of an error, the Fortran code calls "exit" or "stop" which in turn entirely kills the python process that calls the FMM. The FMM failing for certain inputs is not a problem as long as I can catch the exception being thrown.

Another issue I am experiencing is that for certain inputs, the FMM code simply segfaults. (I have saved the inputs that crashed the code if they can be of help). Again, the segfault kills the entire system.

I installed the code as described in the readthedocs. Am I doing something wrong?

mrachh commented 4 years ago

Apologies for the issue.

Yes, if you could send your inputs that would be helpful. Could you also specify the operating system you are working with, the version of python used, the fortran compiler used to generate the python library, and any extra compile flags used (for e.g. whether FAST_KER=ON, was used).

We will update the code shortly so that the fortran code exits softly on catching an error.

helange23 commented 4 years ago

Thanks a lot for the quick response.

OS: Ubuntu 18.04.4 LTS Python version: 3.7.6 GFortran version: 7.5

I used "make python" to install the packages without any additional flags. A pickled dictionary that contains the inputs that caused the segfault can be found here. The code I used is below:

import lfmm3d_fortran as lfmm
import pickle

d = pickle.load(open('failed.dict','rb'))
input = d['input']
source = d['source']
target = d['target']

nd = input.shape[0]
eps = 1E-2
nsource = source.shape[1]
ntarg = target.shape[1]

_, src_grad, output, trg_grad = lfmm.lfmm3d_st_c_g_vec(eps, source,
                                                       input,
                                                       target,
                                                       nd, nsource, ntarg)

When I change eps to 1E-5, then the algorithm works fine however for 1E-2, I get a segfault.

mrachh commented 4 years ago

Hi, Thanks for bringing this to our notice. There was indeed an issue with our code that has now been fixed as a patch. We are making longer term changes which will completely eliminate any hard exits from fortran, but for now none of the hard exits should be triggered except in extreme circumstances.

As a side note, your current distribution of source locations has many repeated sources. In particular there are 228 sources at one particular location. We recommend using the code with source locations being distinct locations, a few repeat locations might be fine, but more than 40 sources at the same location can cause potential performance issues.

Let me know if you are still running into any difficulties.