Open umashankark opened 3 years ago
This _strip_protocol() implementation handles such inputs:
STORE_SUFFIX = '.dfs.core.windows.net' if not path.startswith('abfs://'): path.lstrip("/") path = 'abfs://' + path ops = infer_storage_options(path) if "username" in ops: if ops.get("username", None): ops["path"] = ops["username"] + ops["path"] elif ops.get("host", None): if ops["host"].count(STORE_SUFFIX) == 0: #no store-suffix, so this is container-name ops["path"] = ops["host"] + ops["path"] return ops["path"]
Please let me know if a PR can be created with above change.
The above would be a welcome improvement. I would revise line #2 as follows to support using the "az://" as well.
if not path.startswith(("abfs://", "az://")):
It would be great if you included a unit test to validate the use case included in the proposed fix as well.
Sure, @hayesgb. Will raise a PR.
to address a related issue - translating between Spark and Pandas/Dask - any objection to adding an alias abfss
in addition to az
?
I was thinking about this a little more. fsspec implements a _get_kwargs_from_url
which might be ideal here https://github.com/intake/filesystem_spec/blob/ee22435bc57bd9158103415c5fc58c3cbdddebf2/fsspec/spec.py#L199
I’m open to adding it. Just curious as to why we would need “abfss://“ in addition to the existing protocols?
On Jul 26, 2021, at 10:20 PM, Cody @.***> wrote:
to address a related issue - translating between Spark and Pandas/Dask - any objection to adding an alias abfss in addition to az?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
I was thinking about this a little more. fsspec implements a
_get_kwargs_from_url
which might be ideal here https://github.com/intake/filesystem_spec/blob/ee22435bc57bd9158103415c5fc58c3cbdddebf2/fsspec/spec.py#L199
Can you please elaborate this idea ?
I added this into master with #271 . Also @lostmygithubaccount -- I added the ability to register the abfss protocol by importing the package into the local namespace, but it will take a PR to fsspec to have the abfss protocol registered there.
Out of curiosity, is there an interest in using adlfs for Spark with Azure, or is this more about improving cross-code compatibility between Dask and Spark?
thanks for the update @hayesgb.
AzureBlobFileSystem._strip_protocol('abfs://container/path-part/file') -> returns: 'container/path-part/file' AzureBlobFileSystem._strip_protocol('abfs://container@account.dfs.core.windows.net/path-part/file') -> returns: 'account.dfs.core.windows.net/path-part/file' - where 'container/path-part/file' needs to be returned.
Supporting above return pattern, will help applications (say, that work with Spark & fsspec,) use same URL for data access.