google-code-export / pydicom

Automatically exported from code.google.com/p/pydicom
0 stars 0 forks source link

Too many open files exception #68

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Thanks for the great piece of software!

I have one issue. I use this library for an anonyization tool I wrote. It
works great but at some point I get an exception "too many open files":

  File "c:\Python25\lib\site-packages\dicom\filereader.py", line 410, in
read_file
  File "c:\python25\lib\site-packages\dicom\filebase.py", line 152, in
DicomFile
IOError: [Errno 24] Too many open files

This occurs on Windows XP, Vista, and 7, where the number of open files is
seemingly limited to some number. I have reviewed my code and it only uses
dicom.read_file() and dicom.write_file(). It does not open any other files.
The parts where the files are read and written are called sequentially, so
there should not be a problem with too many open files.

I also reviewed filereader.py and filewriter.py and from what I can see,
those functions open() and close() the files in the correct way. I added
"del fp" in filereader.py and filewriter.py just after the close() there in
order to find out whether brutally deleting the objects closes them on OS
level, but I was not successful. The exception persists.

The problem occurs when I read and write many files. I have no real number,
but from my tests it seems that if you read and write more than 2000 files,
the exception is thrown. If I let my beast eat a smaller number of files,
everything is fine.

It is of course possible that my own code using dicom is just brainless,
but I am pretty sure that the methods are called correctly and in the right
order. 

Original issue reported on code.google.com by andreas....@gmail.com on 9 Dec 2009 at 3:31

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
This might have to do with a limited number of low-level open file handles. You 
could try something like (not tested):

in your application:

import os
import dicom
from dicom.filebase import DicomFileLike

for file_path in my_files:
    os_id = os.open(file_path, os.O_BINARY | os.O_RDONLY))
    fd = os.fdopen(os_id)
    dataset = dicom.read_file(DicomFileLike(fd))
          ...
    os.close(os_id)

You might also have to close the OS-level file handles of your output files.

Original comment by DaniN...@gmail.com on 10 Dec 2009 at 3:18

GoogleCodeExporter commented 9 years ago
Thanks for the suggestion! It did improve things a bit, but I still get the 
problem
when writing files. I did not find out how to apply the same scheme to writing,
because write_file() also uses close() which then leads to a bad file 
descriptor OSError.

For reading I used the following code:

for filename in my_files:
  os_id = os.open(str(filename), os.O_BINARY | os.O_RDONLY)
  fd = os.fdopen(os_id)
  image = dicom.read_file(DicomFileLike(fd))
  os.close(os_id)

Another question: Does the python garbage collector help in this?

Original comment by andreas....@gmail.com on 17 Dec 2009 at 11:22

GoogleCodeExporter commented 9 years ago
A little clarification (the last comment was a little messed up):

My little program reads all DICOM files twice, first for doing some analysis 
and then
for actually doing something. I cannot keep all data in memory, because we're 
talking
about Gigabytes here. So first all dicom files are loaded once (via 
file_read()),
then some calculation is done and then the files are opened again and a modified
version is written to disk. I've noticed the following behaviour:

1 for file in all_files:
2   image = dicom.read_file(file)
3 
4 do_some_stuff()
5
6 for file in all_files:
7   image = dicom.read_file(file)
8   do_something_with_the_image(image)
9   dicom.write_file(new_file, image)

Observations:
1. If I only use the suggested fix in line 2, the behaviour gets better, 
   but I still get a too many open files error, only later than before.
2. If I use the fix in line 2 and line 7, I get bad file descriptors

Original comment by andreas....@gmail.com on 17 Dec 2009 at 1:43

GoogleCodeExporter commented 9 years ago
I've pushed some code to the repository which might help (revision 0b5842e071) 
-- it 
should handle these kinds of issues better. With this new code, if you supply a 
filename (reading or writing), it should alway close the file, even when 
exceptions 
occur. If you pass a file object (reading only so far), then you are 
responsible for 
closing it.

In your code above, is all_files a list of file names, or a list of file 
objects?

Another thought: I'm not sure about os.open, os.fdopen etc. When I have used a 
file 
object, I have simply used the object returned by the builtin open() (not 
os.open).

You asked about garbage collection -- I don't think that should matter. But 
maybe 
there is some kind of latency after python closing the object before windows 
actually  
frees the resource. Perhaps there is a flush command of some kind needed?

I will modify write_file so it also accepts a file object, then maybe there 
will be 
more options to explore...

Original comment by darcymason@gmail.com on 18 Dec 2009 at 5:58

GoogleCodeExporter commented 9 years ago
Thanks, I'll try the new code next week, when I'm back in the office.

To answer your question: all_files is a list of filenames, not file objects. I 
like
the idea of not having to care about file handling. Calling read_file() and
write_file() with a file name rather than a file object is exactly what I want. 
:-)

open(): I guess that there is some kind of latency on the Windows platform that
produces this error when using many files. I would think that close() does what 
it
should do and that the OS is slow in noticing that action. But that is just a 
wild
guess...

Garbage collection: I tried using gc.collect() after each file was close()d, 
but it
did not change anything.

Original comment by andreas....@gmail.com on 18 Dec 2009 at 9:36

GoogleCodeExporter commented 9 years ago
fd=open(); fd.close() indeed seems to do what one would expect, so no "obvious" 
need 
for accessing low-level file handles via os methods. I could not verify my 
suspicion 
when testing and also got according feedback from c.l.p.

Original comment by DaniN...@gmail.com on 18 Dec 2009 at 11:14