fsspec / gcsfs

Pythonic file-system interface for Google Cloud Storage
http://gcsfs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
334 stars 143 forks source link

`unstrip_protocol` not implemented correctly #619

Open alexdmiller opened 5 months ago

alexdmiller commented 5 months ago

fs.unstrip_protocol should return an URI that starts with gs://..., but instead returns gcs://..., which is not a valid GCS URI.

martindurant commented 5 months ago

At the point of running unstrip, we no longer know what protocol prefix was used in the original string, of course. "gcs" and "gs" are both allowed by GCSFS, and the former is by far the more popular with users (see also examples in the documentation). The aim of unstrip is to produce a URL which fsspec will recognise, so either is "valid". I believe our use of full URLs of the style "gcs://bucket/path/file" may predate google's.

alexdmiller commented 5 months ago

Thanks for your reply!

"gcs" and "gs" are both allowed by GCSFS, and the former is by far the more popular with users

How are you measuring that? I believe gs:// is more popular amongst the general public.

...(see also examples in the documentation)

Do you mean the fsspec docs or the Google docs? I can't find a single mention of gcs:// in the Google docs.

The aim of unstrip is to produce a URL which fsspec will recognise

I believe the aim should also be to produce URIs that are digestible by other tools. Currently, fsspec produces bespoke non-standard URIs not recognized by other tools, including Google's official gsutil CLI:

$ gsutil ls gcs://my-bucket
InvalidUrlError: Unrecognized scheme "gcs".

I believe our use of full URLs of the style "gcs://bucket/path/file" may predate google's.

That may be true, and I apologize for not having the context on fsspec. I'm just a new client of the library trying to print out URIs which can be consumed by others on my team.

I should say that fsspec is an absolutely lovely library to work with and I'm such a fan. That's why I so badly want this little kink to be ironed out. Thanks for your hard work!

martindurant commented 4 months ago

I believe gs:// is more popular amongst the general public.

but not among our user base, for obvious reasons. Would you be prepared to switch the order of the gcsfs.core.GCSFileSystem.protocol for yourself at runtime? You could propose this as a PR and we can maybe get a feeling for whether this is disruptive for people here.

alexdmiller commented 4 months ago

Sure, PR is here: https://github.com/fsspec/gcsfs/pull/620