Closed provinzkraut closed 6 months ago
After reviewing the code, it's not clear to my why the AsyncFileSystem
is patched in the first place, as it doesn't seem to be used here.
Another note, maybe the patching should be opt-in for users, as it might be unexpected that (transient) dependencies break other installed packages simply by importing something.
Dear @provinzkraut ,
the patching has a long story you could break down to the following problems:
What do you think about index caching? And would you like to work on that as well on fsspec side? The PR currently needs to be fixed.
[1] https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/ [2] https://github.com/fsspec/filesystem_spec/pull/895
Thanks for the detailed explanation @gutzbenj!
One thing I don't fully understand though is this part:
this doesn't exist in fsspec so we had to patch it
You're patching the HTTPFileSystem
, but from briefly looking through the codebase, I don't see anything from preventing you from just using your own HTTPFileSystem
subclass without monkeypatching the library. At a first glance, this seems like a convenient solution. Is there a reason why this can't be done?
The reason lays here: https://github.com/fsspec/filesystem_spec/blob/1dfed9d9c23aa30b19bad536a578cda6addb7de0/fsspec/spec.py#L149
The self.dircache
component is set in the base class but can't be configured from subclasses. Anyhow as the caching doesn't seem to work atm, I will remove the monkeypatch and look into working on the PR at fsspec itself.
The monkeypatching done for
AsyncFileSystem
currently breaks 3rd partyAsyncFileSystem
because it doesn't use cooperative inheritance, so not__init__
s in a classes MRO are called when subclassingAsyncFileSystem
, breaking filesystems like adlfs.This PR fixes that by adding the missing
super().__init__
call.