Open ozen opened 1 month ago
Hey @ozen thanks for reporting, this is indeed something that is currently slightly quirky, however the error message provides you with a hint on the workaround, because createing a GCS
type secret should make this work. Not that you should not need fsspec here.
import duckdb
duckdb.sql("CREATE SECRET gcs1 (TYPE GCS)")
duckdb.sql("SELECT * FROM read_csv('gcs://bucket/file.csv')")
Also note that using fsspec with authentication will currently not work at all because of the way part of IO is currently handled by the kernel using its internal cloud storage libaries, while the other part is handled through DuckDB. This means that any auth you configure through fsspec will not be propagated to the kernel.
Either way I will look into removing the need for the empty gcs secret here.
@samansmink thanks for the detailed answer.
From an enterprise standpoint, there are considerable differences between using HMAC keys with interoperatibility layer and using standard methods of GCP authentication. I think not every user will simply be able to use HMAC keys. fsspec provides the way to use GCP authentication schemes.
Is there any way to move the IO from the kernel to duckdb?
Well I think it may have worked accidentally before, but only on public data. I don't really see how authentication wouldve worked there
Is there any way to move the IO from the kernel to duckdb?
Yes! This is actually what the peeps over at the delta-kernel-rs project are working on right now. So currently DuckDB relies on the kernel to do IO for things like metadata reads, deletion vector reads, checkpoints etc. However the idea is that kernel will support APIs in the future to ensure DuckDB can do all IO itself. This will allow us to remove the convoluted code in https://github.com/duckdb/duckdb_delta/blob/24d9b782b1da7676e4c8aae7b9d7650cb035276c/src/functions/delta_scan.cpp#L115 that we now require as well.
With that, we will be able to support using fsspec for delta cleanly
@samansmink Thank you again for the detailed explanation. Great to hear that!
Using fsspec Filesystems used to work for me when using
delta_scan
. Now it doesn't, and the reason appears to be the version upgrade.This code works as expected:
This code used to work, now raises an exception:
The exception:
I think
register_filesystem
must have priority over builtin filesystems.