Closed gerritholl closed 4 years ago
Thanks for reporting. I don't immediately have any thoughts, but I hope I can get around to trying myself. I cannot understand the origins of the NameError (open
is in builtins, after all) or why the target file-system should matter. It may be a good idea to move the final save of the cache metadata from __del__
to a weakref.finalize
call.
In addition , the metadata probably doesn't need saving in every case, as it probably didn't change; and in any case, operations in del/finalize should probably be surrounded by a try/except.
I'm not 100% sure if it's due to the target file system, but it's the only remotely relevant difference I could think of. Let me know if you want me to try other variations in case you can't reproduce it yourself (yet).
I don't know if it's related/useful information, but when instead of reading directly I try to read with h5netcdf, I'm getting some additional information:
Using
import fsspec, h5netcdf
of = fsspec.open(
"blockcache::s3://noaa-goes16/ABI-L1b-RadF/2019/321/14/OR_ABI-L1b-RadF-M6C01_G16_s20193211440283_e20193211449591_c20193211450047.nc",
target_protocol="s3",
cache_storage="/data/gholl/cache/fscache/blockcache",
s3={"anon": True},
target_options={"anon": True})
with of as f:
with h5netcdf.File(f) as hf:
print(hf["Rad"][400, 300])
I get:
/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/aiohttp/helpers.py:107: DeprecationWarning: "@coroutine" decorator is deprecated since Python 3.8, use "async def" instead
def noop(*args, **kwargs): # type: ignore
/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/aiohttp/connector.py:964: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
hosts = await asyncio.shield(self._resolve_host(
/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/aiohttp/locks.py:21: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
self._event = asyncio.Event(loop=loop)
/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/caching.py:54: ResourceWarning: unclosed file <_io.BufferedRandom name='/tmp/tmpnm_ifnpp/12ebe5cb08aea5e1bc4f90fab4be1ddbe4a23972f4d183f4f829a2649ac48d0a'>
self.cache = self._makefile()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
1023
Traceback (most recent call last):
File "mwe91.py", line 10, in <module>
print(hf["Rad"][400, 300])
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/h5netcdf/core.py", line 811, in __exit__
self.close()
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/h5netcdf/core.py", line 803, in close
self._h5file.close()
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/h5py/_hl/files.py", line 443, in close
h5i.dec_ref(id_)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5i.pyx", line 150, in h5py.h5i.dec_ref
File "h5py/defs.pyx", line 1234, in h5py.defs.H5Idec_ref
File "h5py/h5fd.pyx", line 169, in h5py.h5fd.H5FD_fileobj_write
File "h5py/h5fd.pyx", line 175, in h5py.h5fd.H5FD_fileobj_truncate
AttributeError: 'S3File' object has no attribute 'truncate'
Exception ignored in: <function AbstractBufferedFile.__del__ at 0x7f87c8bfa5f0>
Traceback (most recent call last):
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/spec.py", line 1493, in __del__
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 338, in <lambda>
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 386, in <lambda>
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 354, in close_and_update
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 386, in <lambda>
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 152, in save_cache
NameError: name 'open' is not defined
Exception ignored in: <function File.close at 0x7f87c3f109b0>
Traceback (most recent call last):
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/h5netcdf/core.py", line 802, in close
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/h5netcdf/core.py", line 794, in flush
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/h5netcdf/core.py", line 531, in _create_dim_scales
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/h5netcdf/core.py", line 374, in _h5group
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/h5py/_hl/group.py", line 264, in __getitem__
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
ValueError: Not a location (invalid file ID)
I can not reproduce my earlier diagnosis that it's due to the file system. Within the same file system, I get the problem for all directories, except one where I had started the cache originally using filecache
. I will edit the original bug report and title to reflect this.
OK, so I think that the original ignored exception would be avoided by:
--- a/fsspec/spec.py
+++ b/fsspec/spec.py
@@ -1501,7 +1501,8 @@ class AbstractBufferedFile(io.IOBase):
return self.mode in {"wb", "ab"} and not self.closed
def __del__(self):
- self.close()
+ if not self.closed:
+ self.close()
def __str__(self):
or
--- a/fsspec/implementations/cached.py
+++ b/fsspec/implementations/cached.py
@@ -330,6 +331,8 @@ class CachingFileSystem(AbstractFileSystem):
def close_and_update(self, f, close):
"""Called when a file is closing, so store the set of blocks"""
+ if self.closed:
+ return
path = self._strip_protocol(f.path)
c = self.cached_files[-1][path]
This is because, when using the context, the file should be cleanly closed; but on interpreter shutdown, the instance is garbage-collected, and this was calling close again - and the second time, some system modules have already been unloaded.
As for
AttributeError: 'S3File' object has no attribute 'truncate'
this should never happen in read mode. The default mode for h5netcdf.File
is apparently "a"
, so I would try with "r"
. S3 files cannot be truncated.
Right — if I manually delete the file object before interpreter shutdown, there is no error message:
import fsspec
of = fsspec.open("blockcache://anaconda-public-datasets/gdelt/csv/20150906.export.csv",
mode="rt", target_protocol="s3",
cache_storage="/tmp/cache-test", target_options={"anon": True})
with of as f: x=f.read(10)
print(x)
del f
which would make sense because then the deletion happens when there is no interpreter shutdown going on.
there is no error message:
OK good, so at least one thing cleared up. I'm not sure how we can test against this, but both changes I suggested, each of which fix this independently, are reasonable and can be kept - it should always be possible to call close multiple times, but del shouldn't call it unnecessarily.
Sorry, should have been f.closed
in close_and_update
. I see this working:
In [1]: import fsspec, h5netcdf
...: of = fsspec.open(
...: "blockcache::s3://noaa-goes16/ABI-L1b-RadF/2019/321/14/OR_ABI-L1b-RadF-M6C01_G16_s20193211440283_e20193211449591_c20193211450047.nc",
...: s3={"anon": True},
...: )
...: with of as f:
...: with h5netcdf.File(f, mode='r') as hf:
...: print(hf["Rad"][400, 300])
...:
1023
I'm still getting
Exception ignored in: <function AbstractBufferedFile.__del__ at 0x7f21cec4a820>
Traceback (most recent call last):
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/spec.py", line 1510, in __del__
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 326, in <lambda>
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 376, in <lambda>
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 342, in close_and_update
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 376, in <lambda>
File "/data/gholl/miniconda3/envs/py38b/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 154, in save_cache
NameError: name 'open' is not defined
in a more complicated context, I will try to produce another MCVE to see if it's the same issue.
Since this happens during __del__
, it is likely that the interpreter is busy cleanign up, and that's why the builtin open
has disappeared. It would be better to use weakref.finalize
rather than __del__
to do the last cache save, but a try/except during __del__
would be fine too.
When I use a blockcache, I get a
NameError
exception ignored inAbstractBufferedFile.__del__
. With this code:I get:
I don't get this if the cache was originally created using
filecache
, and thus the cache file is not sparse.I'm using
fsspec 0.8.1
on openSUSE 15.0, Python 3.8.5, Linux kernel 4.12.14.