Tinche / aiofiles

File support for asyncio
Apache License 2.0
2.76k stars 150 forks source link

Possible bug: unclosed file warning #165

Open ErikKalkoken opened 1 year ago

ErikKalkoken commented 1 year ago

I am currently building a queue, which stores it's content in a binary file and I think I stumbled over a bug.

After canceling an asyncio task that is still reading from a binary file I am getting the following warning (with tracemalloc enabled):

/usr/lib/python3.11/asyncio/base_events.py:1907: ResourceWarning: unclosed file <_io.BufferedReader name='/tmp/tmpf3t6la35/queue.dat'>
  handle = self._ready.popleft()
Object allocated at (most recent call last):
  File "/usr/lib/python3.11/concurrent/futures/thread.py", lineno 58
    result = self.fn(*self.args, **self.kwargs)

The code for reading the file in the task is this:

try:
    async with aiofiles.open(self._data_path, "rb") as fp:
        data = await fp.read()
except FileNotFoundError:
    return []

I was expecting that the aiofiles context manager would close the file, but that does not appear to be the case. I do not know much about how aiofiles works, but from what I understand all I/O operation happen in a separate thread, so maybe those threads are not waited for when the context manager closes?

Btw: I saw the same warning for the write buffer in other code, so this appears to be a systematic issue.

Unfortunately, I was not able to reproduce this issue in a minimal code example, so instead I am linking to the branch with the unit test, which reproduces this issue every time.

As I workaround I turned off buffering. That works for me, but I guess that might not be a solution for everyone. This seams like a serious bug to me, so I am hoping someone can look at this.

Tinche commented 1 year ago

So here's the thing.

The entire async ecosystem is built with cancellation in mind; cancelling asyncio tasks and writing async code so that it handles cancellation is a routine thing. Threads don't really have a concept of cancellation though; as far as I'm aware threads cannot be reliably and portably cancelled (but I admit not knowing a whole lot about them though). This is one of the big advantages async tasks have over threads ;) So fundamentally, as long as we use threads we can't really handle this case well.

That said, looking at the code, the async context manager will try to close the file on __aexit__. Is the default thread pool getting shutdown before this happens in some way?

ErikKalkoken commented 1 year ago

That is a good point. However, I do not think the underlying thread needs to be canceled, but rather it should be waiting upon to complete in __aexit__.