earthobservations / wetterdienst

Open weather data for humans.
https://wetterdienst.readthedocs.io/
MIT License
349 stars 54 forks source link

Fix `AsyncFileSystem` monkeypatching breaking 3rd-party file systems #1241

Closed provinzkraut closed 6 months ago

provinzkraut commented 6 months ago

The monkeypatching done for AsyncFileSystem currently breaks 3rd party AsyncFileSystem because it doesn't use cooperative inheritance, so not __init__s in a classes MRO are called when subclassing AsyncFileSystem, breaking filesystems like adlfs.

This PR fixes that by adding the missing super().__init__ call.

provinzkraut commented 6 months ago

After reviewing the code, it's not clear to my why the AsyncFileSystem is patched in the first place, as it doesn't seem to be used here.

Another note, maybe the patching should be opt-in for users, as it might be unexpected that (transient) dependencies break other installed packages simply by importing something.

gutzbenj commented 6 months ago

Dear @provinzkraut ,

the patching has a long story you could break down to the following problems:

What do you think about index caching? And would you like to work on that as well on fsspec side? The PR currently needs to be fixed.

[1] https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/ [2] https://github.com/fsspec/filesystem_spec/pull/895

provinzkraut commented 6 months ago

Thanks for the detailed explanation @gutzbenj!

One thing I don't fully understand though is this part:

this doesn't exist in fsspec so we had to patch it

You're patching the HTTPFileSystem, but from briefly looking through the codebase, I don't see anything from preventing you from just using your own HTTPFileSystem subclass without monkeypatching the library. At a first glance, this seems like a convenient solution. Is there a reason why this can't be done?

gutzbenj commented 6 months ago

The reason lays here: https://github.com/fsspec/filesystem_spec/blob/1dfed9d9c23aa30b19bad536a578cda6addb7de0/fsspec/spec.py#L149 The self.dircache component is set in the base class but can't be configured from subclasses. Anyhow as the caching doesn't seem to work atm, I will remove the monkeypatch and look into working on the PR at fsspec itself.