fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
1.04k stars 362 forks source link

Fix parsing s3 accesspoint url #1743

Closed wang2bo2 closed 2 weeks ago

wang2bo2 commented 3 weeks ago

Parsing S3 access point URI, like s3://arn:aws:s3:us-west-2:1234:accesspoint/abc/123.jpg, causes problems in infer_storage_options for the :accesspoint was parsed as port number. This PR addresses that.

martindurant commented 3 weeks ago

Whilst this appears to do the right thing, I wonder if you see any way that this can be isolated in s3fs rather than here?

martindurant commented 2 weeks ago

ping @wang2bo2

wang2bo2 commented 2 weeks ago

Yes, we could remove the use of infer_storage_options and duplicate the extraction of url_query here https://github.com/fsspec/s3fs/blob/f3f63cbfbfe71a4355abd63cafd8c678c4a5a0af/s3fs/core.py#L394

In the current infer_storage_options implementation, anything after the last : would be excluded from getting into options["host"] https://github.com/fsspec/filesystem_spec/blob/9a161714f0bbfe44ee769f259420f2f7db975471/fsspec/utils.py#L97 even if the non-int port error is caught and handled. So I thought it's good to fix that too.

martindurant commented 2 weeks ago

I think it would make sense to look for "accesspoint" in s3fs, but still use to infer_storage_options in the common case. Perhaps the accesspoint branch could simply modify the URL into something that makes sense. An ARN description isn't very much like a URL anyway...

martindurant commented 1 week ago

Closing in favour of https://github.com/fsspec/s3fs/pull/912