Open shoyer opened 10 years ago
Hi,
I also suffer the same bug, reproducible with this very simple script:
issue261.py: import netCDF4 as nc for i in xrange(1, 33): print(i) d = nc.Dataset('issue261.nc')
with issue261.nc generated this way: ncgen -b -k netCDF-4 issue261.cdl issue261.cdl: netcdf issue261 { dimensions: one = 1 ; variables: string v(one) ; }
segfault trace on gdb: gdb python issue261.py GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu"... (gdb) run Starting program: python issue261.py [Thread debugging using libthread_db enabled] [New Thread 0x7f26661c26e0 (LWP 13946)] [New Thread 0x41d1f950 (LWP 13949)] [New Thread 0x42520950 (LWP 13950)] [New Thread 0x42d21950 (LWP 13951)] 1 2 3 4 5 6 7 8 9 10
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f26661c26e0 (LWP 13946)] 0x00007f26606259e4 in H5F_addr_decode () from libhdf5.so.9 Current language: auto; currently asm (gdb) where
at nc4file.c:1900
use_parallel=<value optimized out>, mpidata=<value optimized out>, dispatch=0x7f2661a72320, nc_file=0x25a0df0) at nc4file.c:2261
at dfile.c:1777
defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265
closeit=1, flags=0x7fff6e1d5d60) at Python/pythonrun.c:1371
(gdb)
Using keepweakref=True
when opening the Dataset eliminates the segfault for me.
import netCDF4 as nc
for i in xrange(1, 33):
print(i)
d = nc.Dataset('issue261.nc',keepweakref=True)
This suggests that the the garbage collector is not triggering the __dealloc__
Dataset method, and some internal data structures inside the HDF5 and/or netcdf library are overflowing when too many files are open. I guess there are two possible solutions:
1) figure out why the dataset is not going out of scope (where is the reference being kept?), fix that so the files do get closed.
2) file a netcdf bug report, since the segfaults should not happen when opening 33 files. This will require reproducing the segfault is a simple C program.
Of course, addressing both of these at the same time is probably a good idea.
Of course, using the python context manager will also avoids the segfault (by making sure the file is closed).
import netCDF4 as nc
for i in xrange(1, 51):
print(i)
with nc.Dataset('issue261.nc') as f:
print f
I have been unable to reproduce the problem in a simple C program (so far).
The traceback provided by @jdemaria looks similar to one discussed on the h5py list:
https://groups.google.com/forum/#!msg/h5py/3v0oBQ3SVkk/qsCwQnfTxuEJ
Hi, thanks for your quick answer! I understand from the h5py discussion that the source of the problem is not in the NetCDF C library but a thread-bug in h5py, am I wrong?
That's what it sounds like, but it happens for me even when OMP_NUM_THREADS=1. I may try recompiling hdf5 without threading enabled and see if that makes a difference.
The segfault occurs even when hdf5 is compiled with the "threadsafe" option.
Also occurs if "with nogil" wrapper around netcdf library calls is removed. So, it does not look to be a thread related issue.
As described here: https://github.com/Unidata/netcdf4-python/issues/218#issuecomment-43287973
The segmentation faults appear when attempting to read array values from a netCDF4.Variable with dtype=str when previous datasets were not closed.
Here is a Travis log that should be (in principle) sufficient for reproducing this.... when I have time, I will attempt to make a simpler test case: https://travis-ci.org/shoyer/xray/jobs/25466389#L120