atmtools / typhon

Tools for atmospheric research
http://www.radiativetransfer.org/
MIT License
58 stars 33 forks source link

Read from remote file system #395

Closed gerritholl closed 2 years ago

gerritholl commented 2 years ago

Since #374, typhon partially supports the fsspec protocol. However, we cannot read files remotely, even when a backend may support this in principle. For example:

import datetime
import s3fs
from typhon.files.fileset import FileSet

fs = FileSet(
        path=(
            "noaa-goes16/GLM-L2-LCFA/{year}/{doy}/{hour}/"
            "OR_GLM-L2-LCFA_G16_s{year}{doy}{hour}{minute}{second}*_"
            "e{end_year}{end_doy}{end_hour}{end_minute}{end_second}*_c*.nc"),
        fs=s3fs.S3FileSystem(anon=True))

finf = fs.find_closest(datetime.datetime(2021, 11, 10, 10))
fs.read(finf)

fails with

Traceback (most recent call last):
  File "/data/gholl/checkouts/protocode/glm-fileset-remote.py", line 16, in <module>
    fs.read(finf)
  File "/data/gholl/checkouts/typhon/typhon/files/handlers/common.py", line 114, in wrapper
    return method(*args, **kwargs)
  File "/data/gholl/checkouts/typhon/typhon/files/fileset.py", line 2707, in read
    data = self.handler.read(decompressed_file, **read_args)
  File "/data/gholl/checkouts/typhon/typhon/files/handlers/common.py", line 114, in wrapper
    return method(*args, **kwargs)
  File "/data/gholl/checkouts/typhon/typhon/files/handlers/common.py", line 670, in read
    self._ensure_local_filesystem(file_info)
  File "/data/gholl/checkouts/typhon/typhon/files/handlers/common.py", line 260, in _ensure_local_filesystem
    raise NotImplementedError(
NotImplementedError: File handler NetCDF4 can only read from local file system, not from S3FileSystem

However, reading directly from s3 is possible in principle:

sfs = s3fs.S3FileSystem(anon=True)
fp = sfs.open(finf.path)
ds = xarray.open_dataset(fp)

Satpy also supports reading files directly from s3fs, for example, for Advanced Baseline Imager (ABI) files.

It would be nice if it would be possible to use the power of routines such as FileSet.icollect(...) while reading directly from a remote filesystem.

gerritholl commented 2 years ago

Currently the NetCDF4 FileHandler in typhon opens the file directly with netCDF4.Dataset. Although xarray can read from a remote filesystem, it can only do so with the h5necdf backend. For typhon to support reading NetCDF files from a remote filesystem, it too would have to use h5netcdf.File. Changing that may have some unintended side-effects.