Open joseph-long opened 3 years ago
I am investigating. Certainly this was known to work in the past, see https://github.com/intake/intake-astro
This is fixed by the following change
--- a/fsspec/implementations/local.py
+++ b/fsspec/implementations/local.py
@@ -292,6 +292,8 @@ class LocalFileOpener(io.IOBase):
return self.f.__iter__()
def __getattr__(self, item):
+ if item == "raw":
+ raise AttributeError
return getattr(self.f, item)
astropy tries to get the .raw
attribute of the file object, presumably worried about buffering or memory mapping outside of its control, but the seek
calls are now applied to the wrapper object and the underlying object separately.
I don't know if there's a good reason for hiding the raw attribute in general like this, or really why astropy is trying to use it.
Interesting. Thanks for investigating so quickly! Do you think you'll incorporate that change? Or should I subclass LocalFileOpener/System and make that change for my application?
Only hint I can find as to why Astropy does this is https://github.com/astropy/astropy/blob/main/astropy/io/fits/util.py#L382-L392
def fileobj_open(filename, mode):
"""
A wrapper around the `open()` builtin.
This exists because `open()` returns an `io.BufferedReader` by default.
This is bad, because `io.BufferedReader` doesn't support random access,
which we need in some cases. We must call open with buffering=0 to get
a raw random-access file reader.
"""
return open(filename, mode, buffering=0)
But fsspec takes pains to support seek
, if I understand correctly?
The astropy comment doesn't make much sense to me, binary-mode files are always seekable, io.BufferedReader.seek
exists.
I think it's numpy's fromfile
method that's probably at fault, and astropy uses .raw
to determine if the file is appropriate to pass to this function or not (use frombuffer(f.read())
instead). So in the first version, numpy was doing something deeper, and the seek location between the low-level file handle and the buffered wrapper got mixed up. I can't tell where or why. It should be fine to extract the raw file handle for those that want to - I'd rather not forbid it.
Interesting. Thanks for looking in to it. How would you suggest working around this? Can i register my own subclass as the LocalFileSystem fallback, or should i monkeypatch, or something else?
I think it should be raised as an issue with astropy. I have tried, and numpy.fromfile works just fine with (buffered) file objects created by fsspec - so I don't know why the check is necessary, and fsspec shouldn't need to block it. (not that I understand how the file position pointers became out of sync)
It appears np.fromfile
calls os.fspath
on the file. I cannot tell where from (this happens in C code), but it suggests that maybe numpy if re-opening the file for its own use, and that's why the location pointer doesn't update.
When opening the same local file via
open()
and viafsspec.open()
I get garbled floating point values from the latter when the library loads it. Confusingly, the ascii header of the file seems to come through fine, so it's at least partially loading the file... (I have implemented an iRODS interface through fsspec, and don't see the issue when opening files from iRODS, so as far as I know this is specific to local paths with fsspec.)When read through fsspec instead of regular
open()
, theastropy.io.fits.open
function is able to parse the header (a sort of ascii preamble) but the associated data are garbled. I'm trying to discourageastropy.io.fits
from trying to memory map or seek or anything (for compatibility with other fsspec backends), hence themode='readonly', memmap=False
in thefits.open
call.I'm totally stumped, but I can reproduce with a short script, included below. It's possible this is an issue that Astropy needs to fix, but I can't figure out what would conceivably be different about the file proxy object that would let it read the header and not the data...
To reproduce:
repro.py
Create a virtual env
Compare output from two approaches to open
ex.fits
, e.g.Here's the same from my own test with iRODS / the
irods_fsspec
backend