Closed scsmithr closed 2 months ago
This works with the updated version. No changes are required, right?
glaredb=> create external table def from gcs options ( bucket = 'vrongmeal-public-test', location = 'userdata1.parquet' );
CREATE TABLE
If you're testing this locally, it's probably working because it's because it's picking up application default credentials. Removing ~/.config/gcloud/application_default_credentials.json
would probably make this fail with the failure to hit the metadata server.
I added a NullCredentailProvider
to resolve this.
There's a tiny issue, and it's to do with GCS. Since the object store always sends the header Authorization: Bearer ...
(even when it's empty), the first time a table is accessed, the request fails. Once the table is accessed without the header, subsequent requests pass. This is probably because GCS keeps a cache of accessed objects later on, so we can't guarantee that it works every time.
Created an issue (and submitted a PR) on object store: https://github.com/apache/arrow-rs/issues/4417
Waiting on DF updates. Right now we don't have a great way to manage these deps, so we're waiting. In the future we might want to think about it more in depth.
We'll have to wait for another release, I guess. The latest object_store
release doesn't have this.
A similar problem exists with S3. We should fix that as well.
@vrongmeal is this one still blocked?
Moving off of current & next sprint (will come after 0.5.0)
@vrongmeal @greyscaled Wanted to check in on this... I think this is probably good now?
I'll check this today. I think GCS should be good. Doubtful about S3 (and Azure, this issue precedes Azure support)
GCS works (raising a PR for NULL Credentials):
> select * from 'gs://vrongmeal-public-test/data.csv';
┌───────┬─────────┐
│ id │ name │
│ ── │ ── │
│ Int64 │ Utf8 │
╞═══════╪═════════╡
│ 1 │ vaibhav │
│ 2 │ sean │
│ 3 │ grey │
└───────┴─────────┘
Need to make a similar change upstream for S3 and Azure.
Context
Outside google cloud:
Inside google cloud:
The object_store crate tries to dial the metadata service when not provided application default credentials or a service account. For objects open to public, this is unnecessary, but the version we're using (0.5.6) does not have a way of disabling that dial.
Version 0.6 does allow slotting in a custom authenticator which we could used. However, datafusion is stuck on 0.5.6 right now, so we're kinda stuck for now.
Expected
Actual
Impact