Open alexdmiller opened 5 months ago
At the point of running unstrip, we no longer know what protocol prefix was used in the original string, of course. "gcs" and "gs" are both allowed by GCSFS, and the former is by far the more popular with users (see also examples in the documentation). The aim of unstrip is to produce a URL which fsspec will recognise, so either is "valid". I believe our use of full URLs of the style "gcs://bucket/path/file" may predate google's.
Thanks for your reply!
"gcs" and "gs" are both allowed by GCSFS, and the former is by far the more popular with users
How are you measuring that? I believe gs://
is more popular amongst the general public.
gcs://
: 10.3k resultsgs://
: 473k results...(see also examples in the documentation)
Do you mean the fsspec docs or the Google docs? I can't find a single mention of gcs://
in the Google docs.
The aim of unstrip is to produce a URL which fsspec will recognise
I believe the aim should also be to produce URIs that are digestible by other tools. Currently, fsspec produces bespoke non-standard URIs not recognized by other tools, including Google's official gsutil
CLI:
$ gsutil ls gcs://my-bucket
InvalidUrlError: Unrecognized scheme "gcs".
I believe our use of full URLs of the style "gcs://bucket/path/file" may predate google's.
That may be true, and I apologize for not having the context on fsspec. I'm just a new client of the library trying to print out URIs which can be consumed by others on my team.
I should say that fsspec is an absolutely lovely library to work with and I'm such a fan. That's why I so badly want this little kink to be ironed out. Thanks for your hard work!
I believe gs:// is more popular amongst the general public.
but not among our user base, for obvious reasons.
Would you be prepared to switch the order of the gcsfs.core.GCSFileSystem.protocol
for yourself at runtime? You could propose this as a PR and we can maybe get a feeling for whether this is disruptive for people here.
Sure, PR is here: https://github.com/fsspec/gcsfs/pull/620
fs.unstrip_protocol
should return an URI that starts withgs://...
, but instead returnsgcs://...
, which is not a valid GCS URI.