ARM-DOE / pyart

The Python-ARM Radar Toolkit. A data model driven interactive toolkit for working with weather radar data.
https://arm-doe.github.io/pyart/
Other
515 stars 266 forks source link

BUG: File Handles #159

Closed josephhardinee closed 10 years ago

josephhardinee commented 10 years ago

I'm not 100% sure and haven't had time to dig into the C code yet, but I think the read_rsl code leaves a reference to the file handle open. I have some code that was looping through 4000+ radar files, and after a couple hundred of them, python crashes with an error about too many file handles open. I may be wrong but this is here as a placeholder until I can figure out what is going wrong with it.

jjhelmus commented 10 years ago

I'm not seeing the number of file descriptors increasing when reading UF files, I'll check on a few other file types and see if I detect any:

import os
import pyart

def n_fd():
    return len(os.listdir("/proc/%d/fd" % os.getpid()))

print "Start of script:", n_fd()

f = open('test.uf', 'r')
print "With file open:", n_fd()

f.close()
print "After closing file:", n_fd()

radar = pyart.io.read_rsl('test.uf')
print "After reading radar file", n_fd()

radar2 = pyart.io.read_rsl('sample_uf.uf')
print "After reading another radar file", n_fd()

# force a OSError: Too many open file
"""
f = []
for i in range(4000):
    f.append(open('test.uf', 'r'))
    print n_fd()
"""

Results

start of script: 4
With file open: 5
After closing file: 4
After reading radar file 4

gzip: stdout: Broken pipe
After reading another radar file 4
jjhelmus commented 10 years ago

@josephhardinee Are you reading or writing Cf/Radial (or netCDF) files? These do not seem to leave their associated file descriptor around but the garbage collector typically keeps the number below the the OS limit. You can always force a garbage collection at the end of a loop.

import os
import pyart
import gc

def n_fd():
    return len(os.listdir("/proc/%d/fd" % os.getpid()))

f = []
for i in range(4000):
    fname = 'fake_data/test_%i.nc' % (i)
    f.append(pyart.io.read_cfradial(fname))
    #gc.collect()       # uncomment to keep to ~4 open file descriptors.
    print fname, n_fd()

Results:

fake_data/test_3986.nc 12
fake_data/test_3987.nc 13
fake_data/test_3988.nc 14
fake_data/test_3989.nc 15
fake_data/test_3990.nc 16
fake_data/test_3991.nc 17
fake_data/test_3992.nc 18
fake_data/test_3993.nc 19
fake_data/test_3994.nc 13
fake_data/test_3995.nc 14
fake_data/test_3996.nc 15
fake_data/test_3997.nc 16
fake_data/test_3998.nc 17
fake_data/test_3999.nc 18

With the gc.collect() line, the script runs slower but the number of file descriptors doesn't change:

fake_data/test_3991.nc 4
fake_data/test_3992.nc 4
fake_data/test_3993.nc 4
fake_data/test_3994.nc 4
fake_data/test_3995.nc 4
fake_data/test_3996.nc 4
fake_data/test_3997.nc 4
fake_data/test_3998.nc 4
fake_data/test_3999.nc 4
josephhardinee commented 10 years ago

I'm reading in UF files, and not actually writing anything out. In just a few minutes I'll hopefully have a minimal code sample for you that reproduces the problem. I'm watching the total file handle open count on my computer and it increases steadily until program crashes, then goes back to the original value.

scollis commented 10 years ago

are we disposing of the RSL object? something like rsl_close

On 5/30/14, 2:55 PM, Joseph Hardin wrote:

I'm reading in UF files, and not actually writing anything out. In just a few minutes I'll hopefully have a minimal code sample for you that reproduces the problem. I'm watching the total file handle open count on my computer and it increases steadily until program crashes, then goes back to the original value.

— Reply to this email directly or view it on GitHub https://github.com/ARM-DOE/pyart/issues/159#issuecomment-44692795.

scollis commented 10 years ago

btw: a temporary work around might be to use the multiprocessing module.. if a file opens in a worker node it may get destroyed on completion

On 5/30/14, 2:55 PM, Joseph Hardin wrote:

I'm reading in UF files, and not actually writing anything out. In just a few minutes I'll hopefully have a minimal code sample for you that reproduces the problem. I'm watching the total file handle open count on my computer and it increases steadily until program crashes, then goes back to the original value.

— Reply to this email directly or view it on GitHub https://github.com/ARM-DOE/pyart/issues/159#issuecomment-44692795.

josephhardinee commented 10 years ago

Sorry it is a little ugly but you can find a quick code sample at http://pastebin.com/e3cLZHnE

You'll need to replace the globbing ot point it to a selection of uf files but in my case if you output /proc/sys/fs/file-nr you can see the total number of open file descriptors keep climbing until the program crashes. Adding a .copy() at the end of the last time did not fix it either(I figured maybe a dangling reference was stopping the data from being cleared).

jjhelmus commented 10 years ago

RSL should be closing the file itself, all Py-ART is doing is providing the filename. The RslFile object doesn't have a close method but it does have a __dealloc__ method (needed by Cython) which will free the underlying RSL radar structure by calling RSL_free_radar. The garbage collector should take care of freeing the RslFile object once it drops out of scope.

scollis commented 10 years ago

Yeah.. RSL_free_radar is what I was thinking of.

On 5/30/14, 3:49 PM, Jonathan J. Helmus wrote:

RSL should be closing the file itself, all Py-ART is doing is providing the filename. The RslFile object doesn't have a close method but it does have a |dealloc| method (needed by Cython) which will free the underlying RSL radar structure by calling RSL_free_radar. The garbage collector should take care of freeing the RslFile object once it drops out of scope.

— Reply to this email directly or view it on GitHub https://github.com/ARM-DOE/pyart/issues/159#issuecomment-44697963.

josephhardinee commented 10 years ago

So watching the memory usage, it does not appear to increase much. So the radar structure I believe is getting properly freed. It's just the file descriptor that appears to stay open.

jjhelmus commented 10 years ago

What OS is this on, and what version of RSL?

josephhardinee commented 10 years ago

fedora 19, both 1.43, and 1.44 for rsl

jjhelmus commented 10 years ago

Can you send over a sample file or two?

josephhardinee commented 10 years ago

So I decided to regrab rsl/compile/install etc because why not. It seems to be more steady, going to let a much longer test run and I'll let you know if it is still an issue. Maybe something got stealth fixed in rsl. In anycase, here is an example file: https://www.dropbox.com/s/yiqocw8obw65wym/ifloods_npol1_20130502_011232_uf

It also throws some stuff about unknown datatypes but that is not particularly important.

jjhelmus commented 10 years ago

I didn't have a problem running the script you provided on 4000 copies of that file and watch "ls /proc/$PID/fd" only showed a half dozen or so file descriptors in use at any given time. I'm running RSL version 1.44, maybe something got fixed with that version. Also, might have something to do with the gzip: stdout: Broken pipe message that is output from RSL when reading the file. I have a suspicion that RSL is not doing decompressing quite right but I haven't tracked down what is causing that message.

The good news is that I found two bugs in rsl.py while trying to reproduce the error.

josephhardinee commented 10 years ago

So I left it running for a while and all appears good now. I'll go ahead and close the issue.