atmtools / typhon

Tools for atmospheric research
http://www.radiativetransfer.org/
MIT License
58 stars 33 forks source link

Partial support for filesystems in FileSet #374

Closed gerritholl closed 3 years ago

gerritholl commented 4 years ago

This PR adds partial support for fsspec AbstractFileSystem implementation in FileSet, in particular for finding files. An example using sf3s:

import s3fs
from typhon.files.fileset import FileSet
abi_fileset = FileSet(
        path="noaa-goes16/ABI-L1b-RadF/{year}/{doy}/{hour}/OR_ABI-L1b-RadF-M6C02_G16_s{year}{doy}{hour}{minute}{second}*_e{end_year}{end_doy}{end_hour}{end_minute}{end_second}*_c*.nc",                                                                                                 
        name="abi",
        fs=s3fs.S3FileSystem(anon=True))
for f in abi_fileset.find("2019-11-18T05:30", "2019-11-18T06:15"):
    print(f)

Gives:

noaa-goes16/ABI-L1b-RadF/2019/322/05/OR_ABI-L1b-RadF-M6C02_G16_s20193220530285_e20193220539593_c20193220540030.nc
noaa-goes16/ABI-L1b-RadF/2019/322/05/OR_ABI-L1b-RadF-M6C02_G16_s20193220540285_e20193220549593_c20193220550041.nc
noaa-goes16/ABI-L1b-RadF/2019/322/05/OR_ABI-L1b-RadF-M6C02_G16_s20193220550285_e20193220559593_c20193220600041.nc
noaa-goes16/ABI-L1b-RadF/2019/322/06/OR_ABI-L1b-RadF-M6C02_G16_s20193220600285_e20193220609593_c20193220610030.nc
noaa-goes16/ABI-L1b-RadF/2019/322/06/OR_ABI-L1b-RadF-M6C02_G16_s20193220610285_e20193220619593_c20193220620040.nc
gerritholl commented 4 years ago

I don't know why it still fails on Windows despite replacing / by os.sep.

gerritholl commented 4 years ago

This still fails on Windows (I don't know why) and it doesn't work as intended with S3FileSystem (it seems globbing behaves differently there): fs_s3.glob("noaa-goes16/ABI-L1b-RadF/2020/045/*") returns nothing even though fs_s3.ls("noaa-goes16/ABI-L1b-RadF/2020/045/") does, and fs_s3.glob("noaa-goes16/ABI-L1b-RadF/2020/*") is very slow, suggesting there is something recursive going on.

gerritholl commented 4 years ago

Finding files is extremely slow due to globbing in s3fs being extremely slow, probably due to the problem reported at https://github.com/dask/s3fs/issues/378.

gerritholl commented 4 years ago

Note that at present the s3fs example only works with s3fs 0.5.0 and not with the latest released s3fs 0.5.1, due to the bug reported at https://github.com/dask/s3fs/issues/378. Hopefully that problem will be fixed so that this searching works with s3fs 0.5.2 or 0.6.

gerritholl commented 4 years ago

Can anyone with access to a Windows Python installation shed light on why the tests may be failing on Windows? Did the tests actually succeed on Windows prior to this PR?

olemke commented 4 years ago

Hi @gerritholl! I can test it on a Windows machine, but not before some time later next week since I don't have a setup ready. However, the tests were working before this PR as you can see in the build log: https://github.com/atmtools/typhon/actions/runs/288892462

gerritholl commented 3 years ago

Meanwhile https://github.com/dask/s3fs/pull/379 was merged so the typhon functionality and the documented example should work with the latest s3fs master or the next release.

olemke commented 3 years ago

This seems to be caused by inconsistent handling of path separators. One place I found to be inconsistent is in

https://github.com/gerritholl/typhon/blob/af22c48f0721e2961388a9305cc1cde26a7cf6a1/typhon/files/fileset.py#L1328-L1329

self.file_system.glob returns a path with / separators while regex uses the \\ version. There might be other places where this causes a problem, but this is all I had time to look at now.

gerritholl commented 3 years ago

Ah, that makes sense. If we search on a remote file system like this PR intends to support, then the assumption that the file system uses os.sep as a separator is no longer guaranteed to be true. I will try to think of a solution.

gerritholl commented 3 years ago

With all the commits to make it work on Windows it has become regrettably somewhat ugly, and perhaps breaking backward compatibility despite tests passing. Can Windows users please comment if it breaks their workflow or not?

olemke commented 3 years ago

I don't know if there are any Windows users of Typhon at the moment, so don't expect an answer to your question. We mainly added Windows as a supported platform because Typhon is a Python-only package and it seemed like the right thing to do to be good Python citizens. Thanks for putting in the work to keep this cross-platform. From my side this is ready to be merged. I'll have a chat with Lukas if he's also fine with it when he's back next week. I don't think you'll need to add John to PRs in the future since he has moved on to other things.

gerritholl commented 3 years ago

I thought I had write access here, but apparently not (anymore?) as no "merge" button shows up ("Only those with write access to this repository can merge pull requests.).

olemke commented 3 years ago

Merges can only be done by maintainers which are currently Lukas and me.