fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
990 stars 347 forks source link

Inconsistent use of protocol specific options #1192

Open agrinh opened 1 year ago

agrinh commented 1 year ago

Setting protocol specific options has been a convenient method for overriding the default options for each protocol. E.g., the Azure blob storage implementation behaves peculiarly and requires setting anon=False to use the credentials in the environment (https://github.com/fsspec/adlfs/issues/348).

So for paths provided by an application, we might do:

fsspec.open(..., az={"anon": False})

This option is ignored for local paths, and used for az:// protocol urls, and therefore allows us to configure defaults for each protocol. Unfortunately, this doesn't work with https(s) protocol urls, since the kwargs are directly forwarded to aiohttp, e.g. https://github.com/fsspec/filesystem_spec/blob/561428ca18a9865d8f63fe188a590d791ec52c92/fsspec/implementations/http.py#L826

Either way, I'm happy to contribute code if we can agree on a solution.

martindurant commented 1 year ago

To answer part of your question, yes you can use protocol-specific arguments to configure the HTTP backend:

>>> of = fsspec.open("http://google.com", http={"encoded": True})
>>> of.fs.encoded
True

(This is exactly equivalent to fsspec.open("http://google.com", encoded=True))

The second part, of not passing options that might have been intended for other backends is undefined behaviour. I can see how it can be convenient, but

The intended use was originally only for multi-component URLs like "simplecache::http://server/path", where we know the two protocols involved, and can find the args to send to each; any "extra" kwargs always go to the foremost component, in this case simplecache.

agrinh commented 1 year ago

@martindurant Thanks for the quick reply!

I understand, neither is a great option. Perhaps the best option is doing the opposite? I.e. allowing / requiring per protocol defaults in a specific argument that only passes down kwargs to the relevant implementation? Something like:

fsspec.open(..., protocol_defaults={"az": {"anon": False}, "http": {"encoded": True}})

I realize this is a bit less convenient, but it's fairly confusing as is now with protocol-specific arguments making it down to the individual implementations.

martindurant commented 1 year ago

That could be a possible solution, but we could not disallow az= directly now, as it is already in use; at least, not without a proper deprecation. I'm not convinced that the longer form would be very popular.