Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library
http://unidata.github.io/netcdf4-python
MIT License
746 stars 261 forks source link

Importing netCDF4 / python v3.9 crashes debuggers with segfault #1063

Open rickgrubin opened 3 years ago

rickgrubin commented 3 years ago

Version

Environment

% python3 --version Python 3.9.1

% brew info python@3.9 python@3.9: stable 3.9.1 (bottled) Interpreted, interactive, object-oriented programming language https://www.python.org/ /usr/local/Cellar/python@3.9/3.9.1_3 (3,901 files, 63.9MB) * Poured from bottle on 2020-12-30 at 18:54:46 From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/python@3.9.rb

IDEs

Steps to reproduce

Both IDEs (whether running in a console window or a debugger) report:

/usr/local/bin/python3.9 <path to python file>
[grinch:30560] *** Process received signal ***
[grinch:30560] Signal: Segmentation fault: 11 (11)
[grinch:30560] Signal code: Address not mapped (1)
[grinch:30560] Failing at address: 0x30
[grinch:30560] *** End of error message ***

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

This issue does not occur when running from the command line, either with or without pdb.

I'm not completely convinced that this is a netCDF4-python issue, but do suspect that it's related to numpy or multiprocessing or both.

--

netcdf4-python-build.txt

jswhit commented 3 years ago

I've got nothing - but I suspect that you are right that it has something to do with the MPI libraries that get loaded when you import netCDF4. Can you confirm that this works if with a non-MPI build of netCDF4?

jswhit commented 3 years ago

You could also test this theory by trying to import mpi4py to see if that crashes the IDE.

rickgrubin commented 3 years ago

A one-liner:

import mpi4py

succeeds within both IDEs. So does just:

import numpy

Uninstalling mpi4py and unsetting MPI_INCDIR, and building / installing against the same (prior) netCDF / HDF5 libraries as noted originally, and then trying a one-liner:

import netCDF4

succeeds within both IDEs.

My test driver code that imports netCDF4 build without MPI support succeeds in running within / without the debugger.

If I'm following along correctly, it seems that mpi4py and netCDF4 are not playing well with each other.

jswhit commented 3 years ago

OK, that's interesting. so netCDF4 imports mpi4py, and apparently that causes the crash (but only in the IDE, not in the terminal).

jswhit commented 3 years ago

Trying to reproduce this on my laptop, but I've never used either of these IDEs. I've got a conda environment configured with the mpi-enabled netcdf4-python. What should I do to trigger this crash in vscode?

rickgrubin commented 3 years ago

Within VSCode, you'll need to install the python extension 2020.12.424452561 (VS Python extension). Extensions are installed by clicking on the icon in the left panel that looks like Tetris blocks. Once installed, you can set your conda path by clicking on the gear icon and scrolling a bit to Python: Conda Path

My test file is a one-liner:

import netCDF4

jswhit commented 3 years ago

OK, I ran this test script

import netCDF4, numpy, mpi4py
print('netcdf4-python version: %s'%netCDF4.__version__)
print('HDF5 lib version:       %s'%netCDF4.__hdf5libversion__)
print('netcdf lib version:     %s'%netCDF4.__netcdf4libversion__)
print('mpi4py version:         %s'%mpi4py.__version__)
print('numpy  version:         %s'%numpy.__version__)
print('parallel features:      %s'%bool(netCDF4.__has_parallel4_support__))

from within vs code and got

(base) Jeffs-MacBook-Pro:~ jwhitaker$ /Users/jwhitaker/opt/anaconda3/envs/mpi/bin/python /Users/jwhitaker/vscode_test.py
netcdf4-python version: 1.5.5.1
HDF5 lib version:       1.10.6
netcdf lib version:     4.7.4
mpi4py version:         3.0.3
numpy  version:         1.19.2
parallel features:      True

so no crash. I guess it's something environment dependent.

rickgrubin commented 3 years ago

Thanks for giving it a go; we do have different environments:

netcdf4-python version: 1.5.5.1
HDF5 lib version:       1.12.0
netcdf lib version:     4.7.4
mpi4py version:         3.0.3
numpy  version:         1.19.4
parallel features:      True

This configuration will run within VSCode without using the debugger, but will fail within the debugger. It fails both within and without the debugger in PyCharm.

I do not install netCDF, HDF5, and OpenMPI via brew, but rather build each from source; we'll have a lot of environment differences.

At least there's another data point for you.

jswhit commented 3 years ago

I'm using python 3.8.5 also.

jswhit commented 3 years ago

works within the debugger in VSCode for me. Any particular reason you can't use conda?

rickgrubin commented 3 years ago

Thanks for the update.

We will not be using conda; we have requirements / restrictions for operational environments that make it very difficult (and sometime impossible) to have unapproved components installed.

As we're constrained to the versions of components listed prior, and we're not using the same versions of components to demonstrate success or failure, not much else to say, it seems.

jswhit commented 3 years ago

OK, understood. Not sure how to proceed from here. Please do report back if you are able to track this down.

jswhit commented 3 years ago

One random thought - you could try editing _netCDF4.pyx and changing

cimport mpi4py.MPI as MPI

to

import mpi4py.MPI as MPI
rickgrubin commented 3 years ago

It doesn't seem to be that simple; making the change you suggest generates compile-time errors, requiring other code fragments, such as:

from mpi4py.libmpi cimport MPI_Comm, MPI_Info, MPI_Comm_dup, MPI_Info_dup, \

and

ctypedef MPI.Comm Comm

to be changed (I changed cimport to import, and ctypedef to typedef), which then generated other errors.

If we figure out what's going on, will update this ticket. In the meantime, we'll modify our code / debug paradigm to deal with the issue, as we cannot use the components / versions you are using, and the IDEs provide other valuable functionality.

jswhit commented 3 years ago

OK, thanks for trying. My guess is that there is some inconsistency with the C runtimes being used for the different libraries (hdf5, netcdf, mpi) and the python extension modules.