AlexR2D2 / metabase_duckdb_driver

Metabase DuckDB Driver shipped as 3rd party plugin
Apache License 2.0
70 stars 21 forks source link

Loading files from S3 (or any HTTP location) via httpfs #18

Closed etoulas closed 9 months ago

etoulas commented 9 months ago

Hi,

I would be interested in using the httpfs extensions for DuckDB. https://duckdb.org/docs/extensions/httpfs#running-queries-over-s3

With the extension it is possible do the following:

SELECT column_a FROM 'https://domain.tld/file.parquet';
SELECT * FROM 's3://bucket/file.parquet?s3_access_key_id=accessKey&s3_secret_access_key=secretKey';

Is this already included in this driver?

If not, how can it be integrated?

Thanks

AlexR2D2 commented 9 months ago

Hi, at least this thing works in the Metabase query editor (in the latest release)

SELECT title, question FROM 'https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/test/0000.parquet';
etoulas commented 9 months ago

Thanks @AlexR2D2.

On my first try I got an error but it provided a solution, too.

I had to execute INSTALL httpfs; first. Then your query worked indeed.

It looked into the user directory for the extension. So, I guess it has to be installed first before it can be used?

image

AlexR2D2 commented 9 months ago

Yes, at first you must install and load extension. For example

INSTALL httpfs;
LOAD httpfs;
<now you can use an httpfs extension>

You can read more about extensions here

etoulas commented 9 months ago

From the docs:

If an extension is not already available locally, it will be installed from the official extension repository (extensions.duckdb.org).

This was achieved with INSTALL httpfs.

Autoloadable extensions are loaded on first use.

The LOAD httpfs was implicit when I tried the example query you provided.

Amazing duck 🦆