Closed josephhardinee closed 10 years ago
I'm not seeing the number of file descriptors increasing when reading UF files, I'll check on a few other file types and see if I detect any:
import os
import pyart
def n_fd():
return len(os.listdir("/proc/%d/fd" % os.getpid()))
print "Start of script:", n_fd()
f = open('test.uf', 'r')
print "With file open:", n_fd()
f.close()
print "After closing file:", n_fd()
radar = pyart.io.read_rsl('test.uf')
print "After reading radar file", n_fd()
radar2 = pyart.io.read_rsl('sample_uf.uf')
print "After reading another radar file", n_fd()
# force a OSError: Too many open file
"""
f = []
for i in range(4000):
f.append(open('test.uf', 'r'))
print n_fd()
"""
Results
start of script: 4
With file open: 5
After closing file: 4
After reading radar file 4
gzip: stdout: Broken pipe
After reading another radar file 4
@josephhardinee Are you reading or writing Cf/Radial (or netCDF) files? These do not seem to leave their associated file descriptor around but the garbage collector typically keeps the number below the the OS limit. You can always force a garbage collection at the end of a loop.
import os
import pyart
import gc
def n_fd():
return len(os.listdir("/proc/%d/fd" % os.getpid()))
f = []
for i in range(4000):
fname = 'fake_data/test_%i.nc' % (i)
f.append(pyart.io.read_cfradial(fname))
#gc.collect() # uncomment to keep to ~4 open file descriptors.
print fname, n_fd()
Results:
fake_data/test_3986.nc 12
fake_data/test_3987.nc 13
fake_data/test_3988.nc 14
fake_data/test_3989.nc 15
fake_data/test_3990.nc 16
fake_data/test_3991.nc 17
fake_data/test_3992.nc 18
fake_data/test_3993.nc 19
fake_data/test_3994.nc 13
fake_data/test_3995.nc 14
fake_data/test_3996.nc 15
fake_data/test_3997.nc 16
fake_data/test_3998.nc 17
fake_data/test_3999.nc 18
With the gc.collect()
line, the script runs slower but the number of file descriptors doesn't change:
fake_data/test_3991.nc 4
fake_data/test_3992.nc 4
fake_data/test_3993.nc 4
fake_data/test_3994.nc 4
fake_data/test_3995.nc 4
fake_data/test_3996.nc 4
fake_data/test_3997.nc 4
fake_data/test_3998.nc 4
fake_data/test_3999.nc 4
I'm reading in UF files, and not actually writing anything out. In just a few minutes I'll hopefully have a minimal code sample for you that reproduces the problem. I'm watching the total file handle open count on my computer and it increases steadily until program crashes, then goes back to the original value.
are we disposing of the RSL object? something like rsl_close
On 5/30/14, 2:55 PM, Joseph Hardin wrote:
I'm reading in UF files, and not actually writing anything out. In just a few minutes I'll hopefully have a minimal code sample for you that reproduces the problem. I'm watching the total file handle open count on my computer and it increases steadily until program crashes, then goes back to the original value.
— Reply to this email directly or view it on GitHub https://github.com/ARM-DOE/pyart/issues/159#issuecomment-44692795.
btw: a temporary work around might be to use the multiprocessing module.. if a file opens in a worker node it may get destroyed on completion
On 5/30/14, 2:55 PM, Joseph Hardin wrote:
I'm reading in UF files, and not actually writing anything out. In just a few minutes I'll hopefully have a minimal code sample for you that reproduces the problem. I'm watching the total file handle open count on my computer and it increases steadily until program crashes, then goes back to the original value.
— Reply to this email directly or view it on GitHub https://github.com/ARM-DOE/pyart/issues/159#issuecomment-44692795.
Sorry it is a little ugly but you can find a quick code sample at http://pastebin.com/e3cLZHnE
You'll need to replace the globbing ot point it to a selection of uf files but in my case if you output /proc/sys/fs/file-nr you can see the total number of open file descriptors keep climbing until the program crashes. Adding a .copy() at the end of the last time did not fix it either(I figured maybe a dangling reference was stopping the data from being cleared).
RSL should be closing the file itself, all Py-ART is doing is providing the filename. The RslFile object doesn't have a close method but it does have a __dealloc__
method (needed by Cython) which will free the underlying RSL radar structure by calling RSL_free_radar. The garbage collector should take care of freeing the RslFile object once it drops out of scope.
Yeah.. RSL_free_radar is what I was thinking of.
On 5/30/14, 3:49 PM, Jonathan J. Helmus wrote:
RSL should be closing the file itself, all Py-ART is doing is providing the filename. The RslFile object doesn't have a close method but it does have a |dealloc| method (needed by Cython) which will free the underlying RSL radar structure by calling RSL_free_radar. The garbage collector should take care of freeing the RslFile object once it drops out of scope.
— Reply to this email directly or view it on GitHub https://github.com/ARM-DOE/pyart/issues/159#issuecomment-44697963.
So watching the memory usage, it does not appear to increase much. So the radar structure I believe is getting properly freed. It's just the file descriptor that appears to stay open.
What OS is this on, and what version of RSL?
fedora 19, both 1.43, and 1.44 for rsl
Can you send over a sample file or two?
So I decided to regrab rsl/compile/install etc because why not. It seems to be more steady, going to let a much longer test run and I'll let you know if it is still an issue. Maybe something got stealth fixed in rsl. In anycase, here is an example file: https://www.dropbox.com/s/yiqocw8obw65wym/ifloods_npol1_20130502_011232_uf
It also throws some stuff about unknown datatypes but that is not particularly important.
I didn't have a problem running the script you provided on 4000 copies of that file and watch "ls /proc/$PID/fd"
only showed a half dozen or so file descriptors in use at any given time. I'm running RSL version 1.44, maybe something got fixed with that version. Also, might have something to do with the gzip: stdout: Broken pipe message that is output from RSL when reading the file. I have a suspicion that RSL is not doing decompressing quite right but I haven't tracked down what is causing that message.
The good news is that I found two bugs in rsl.py while trying to reproduce the error.
So I left it running for a while and all appears good now. I'll go ahead and close the issue.
I'm not 100% sure and haven't had time to dig into the C code yet, but I think the read_rsl code leaves a reference to the file handle open. I have some code that was looping through 4000+ radar files, and after a couple hundred of them, python crashes with an error about too many file handles open. I may be wrong but this is here as a placeholder until I can figure out what is going wrong with it.