Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library
http://unidata.github.io/netcdf4-python
MIT License
756 stars 264 forks source link

Core dump when opening OPeNDAP dataset after first creating local file #982

Open knutfrode opened 4 years ago

knutfrode commented 4 years ago

I encounter the following problem on Ubuntu 18.04 with conda and python 3.6-3.8 (not with Python2). libnetcdf version is 4.7.1, and is locked since I need gdal simultaneously.

The following lines:

from netCDF4 import Dataset
d1 = Dataset('tmp.nc', 'w')
d1.close()
d2 = Dataset('http://thredds.met.no/thredds/dodsC/meps25epsarchive/2019/11/26/meps_mbr0_extracted_2_5km_20191126T00Z.nc')
print(d2.variables['time'])

give a core dump:

<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    long_name: time
    standard_name: time
    units: seconds since 1970-01-01 00:00:00 +00:00
    _ChunkSizes: 1
unlimited dimensions: time
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used
Segmentation fault (core dumped)

If I omit the first generation of the local file (d1), it returns as expected:

<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    long_name: time
    standard_name: time
    units: seconds since 1970-01-01 00:00:00 +00:00
    _ChunkSizes: 1
unlimited dimensions: time
current shape = (67,)
filling off

This is the same for two different OPeNDAP datasets with time as an unlimited dimension. For another dataset with time as fixed size, there is no core dump.

Thus the problem can be formulated as:

Here is the back trace:

$ gdb --args python fail2.py 
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
(gdb) r
Starting program: /home/knutfd/miniconda2/envs/opendrift_p36/bin/python fail2.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3853700 (LWP 11144)]
[New Thread 0x7ffff3052700 (LWP 11145)]
[New Thread 0x7ffff0851700 (LWP 11146)]
[New Thread 0x7fffec050700 (LWP 11147)]
[New Thread 0x7fffe984f700 (LWP 11148)]
[New Thread 0x7fffe704e700 (LWP 11149)]
[New Thread 0x7fffe484d700 (LWP 11150)]
[New Thread 0x7fffe14bf700 (LWP 11151)]
[Thread 0x7fffe14bf700 (LWP 11151) exited]
<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    long_name: time
    standard_name: time
    units: seconds since 1970-01-01 00:00:00 +00:00
    _ChunkSizes: 1
unlimited dimensions: time
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff65d93c9 in nclistfree ()
   from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
(gdb) bt
#0  0x00007ffff65d93c9 in nclistfree ()
   from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#1  0x00007ffff661884c in nc4_nc4f_list_del ()
   from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#2  0x00007ffff661eb1b in NC4_abort ()
   from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#3  0x00007ffff65d0274 in nc_abort ()
   from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#4  0x00007ffff664288c in NCD2_close ()
   from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#5  0x00007ffff65d02e6 in nc_close ()
   from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#6  0x00007ffff679d573 in __pyx_pw_7netCDF4_8_netCDF4_7Dataset_15_close ()
   from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/_netCDF4.cpython-36m-x86_64-linux-gnu.so
#7  0x00007ffff679ea4d in __pyx_tp_dealloc_7netCDF4_8_netCDF4_Dataset ()
   from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/_netCDF4.cpython-36m-x86_64-linux-gnu.so
#8  0x00005555556a0b2d in delete_garbage (old=<optimized out>, collectable=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Modules/gcmodule.c:865
#9  collect ()
    at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Modules/gcmodule.c:1016
#10 0x0000555555740a1a in _PyGC_CollectNoFail ()
    at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Modules/gcmodule.c:1626
#11 0x00005555557004f8 in PyImport_Cleanup ()
    at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Python/import.c:431
#12 0x0000555555765c91 in Py_FinalizeEx ()
    at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Python/pylifecycle.c:608
#13 0x0000555555770f6c in Py_Main ()
    at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Modules/main.c:830
#14 0x0000555555638cde in main ()
    at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Programs/python.c:69
#15 0x00007ffff77e6b97 in __libc_start_main (main=0x555555638bf0 <main>, argc=2, argv=0x7fffffffda48, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffda38)
    at ../csu/libc-start.c:310
#16 0x0000555555721242 in _start () at ../sysdeps/x86_64/elf/start.S:103
knutfrode commented 4 years ago

We found that the problem is avoided with python-netcdf4=1.5.1.2 and libnetcdf=4.6.2

jswhit commented 4 years ago

Can't reproduce on my macos machine with libnetcdf 4.6.2, 4.7.0 or 4.7.2. Can't see how the netcdf4-python version would matter since it's a segfault in the netcdf-c library.

knutfrode commented 4 years ago

I forgot to say that conda-forge was used (as libnetcdf4=4.7.1 is not available in anaconda channel). The problem should be reproducible by:

conda create -n fail -c conda-forge libnetcdf=4.7.1 netcdf4=1.5.3
conda activate fail

and then these lines:

from netCDF4 import Dataset
d1 = Dataset('tmp.nc', 'w')
d1.close()
d2 = Dataset('http://thredds.met.no/thredds/dodsC/meps25epsarchive/2019/11/26/meps_mbr0_extracted_2_5km_20191126T00Z.nc')
print(d2.variables['time'])

do not produce a core dump this time, however, time is an empty variable, instead of length 67 which is correct and obtained within this environment:

conda create -n nofail -c conda-forge libnetcdf=4.6.2 netcdf4=1.5.1
jswhit commented 4 years ago

For those trying to run the OP's test script, you have to change the date in the filename - older files age off.

jswhit commented 4 years ago

Confirmed on macos x with Ananconda. Must be specific to the Anaconda packages though - if I build netcdf 4.7.1 and netcdf4-python 1.5.3 myself it works fine.

jswhit commented 4 years ago

From the traceback, it looks perhaps like the netcdf-c lib is trying to access memory that has already been deallocated by the python garbage collector.

jswhit commented 4 years ago

Error also disappears if I upgrade to conda-forge libnetcdf 4.7.3 (but this requires building netcdf4-python from source since the conda-forge package is pinned on 4.7.1 for some reason).

dopplershift commented 4 years ago

For binary compatibility, all packages in conda-forge are pinned to an exact version of libnetcdf, and they migrate in lock-step. Might be able to request a bump here: https://github.com/conda-forge/conda-forge-pinning-feedstock/issues

gauteh commented 4 years ago

It seems that this is no longer an issue with:

netcdf4                   1.5.3           nompi_py38hd35fb8e_102    conda-forge