Open sugibuchi opened 4 days ago
As with the partner issue, there are pros and cons. Generally, we want to prevent data loss, so even if this is happening during interpreter shutdown (as opposed to normal object cleanup), we ought to try to write to remote. However, is is possible that the filesystem attribute is gone or indeed that the thread/loop are gone and/or shutdown.
So, perhaps the thing to do, is to try to establish whether the write is possible, and at least do it with a timeout. I'm not sure if that's possible! The atexit hook you propose prevents the hanging, but guarantees data loss. Actually, any call to sync()
should check whether the thread/loop is still valid, maybe that's a good start.
Thank you very much, and I understand your point.
A moderate approach is probably to encourage developers of filesystems to consider this problem and implement something special if necessary. If a file class depends on a vital resource and it can be destroyed before the GC of a file object using it, the developer of the file class should implement something to ensure closing the file before, by using atexit
(+WeakSet
etc.?) etc.
And,
Actually, any call to sync() should check whether the thread/loop is still valid, maybe that's a good start.
I totally agree. Unfortunately, it looks not so easy to find a thread where a given event loop is running (AFAIK,, there is no public API for this in asyncio). But at least we can check the liveness of fsspec.asyn.iothread[0]
in fsspec.asyn.sync()
. It would reduce the chance of deadlocks.
https://github.com/fsspec/filesystem_spec/blob/2024.9.0/fsspec/spec.py#L2055-L2057
We have had this
close()
call for years in__del__()
ofAbstractBufferedFile
.I am unsure whether this is a real concern since we have a long history of this practice. But calling
close()
in__del__()
, which needs to access file systems or other resources, can cause some problems in real-world situations.The main concern is timing when
__del__()
is called. This is frequently not under our control (some bad frameworks do not close file objects but continue maintaining references to them). It can be at the very last moment in the Python interpreter's shutdown sequence. This can be a problem, particularly in async filesystems using event loops.Let me demonstrate a toy example.
This code just (1) creates a new dummy file instance and (2) puts it into
cache
. However, the execution of this code will get stuck when exiting.To investigate why, add the following
print()
to__del__()
of the file class.As you can see, the event loop is marked as "running," but the daemon thread hosting it has stopped. This means the file object was garbage-collected after the interpreter terminated the thread. The
sync()
will never return as the thread running the event loop has stopped.The root cause in this example is
cache.add()
, which creates a reference from the globalcache
object to the file object. We should not do this, but we can accidentally have this kind of reference chain from global objects to file objects in real-world situations. It will lead to unexpected deadlocks that are difficult to investigate.I have two proposal:
close()
call fromAbstractBufferedFile.__del__()
. Instead, reimplement it in__del__()
in each concrete file class if it is guaranteed that the class can executeclose()
at any moment in the Python interpreter's lifecycle.atexit
to explicitly close the event loop before entering the shutdown sequence.We immediately got an exception this time. But I believe this is much better than the deadlock.