duckdb / duckdb-web

DuckDB website and documentation
https://duckdb.org
MIT License
180 stars 333 forks source link

Issue found on page 'Attach to a DuckDB Database over HTTPS or S3' #2995

Closed marchelbling closed 5 months ago

marchelbling commented 6 months ago

Please describe the problem you encountered in the DuckDB documentation and include the "Page URL" link shown below. Note: only create an issue if you wish to report a problem with the DuckDB documentation. For questions about DuckDB or the use of certain DuckDB features, use GitHub Discussions, Stack Overflow, or Discord.

Page URL: https://duckdb.org/docs/guides/network_cloud_storage/duckdb_over_https_or_s3

There is some discrepancy in how the httpfs extension is documented:

Also, in Autoloading extensions, it is written

DuckDB will automatically install and load the httpfs extension. No explicit INSTALL or LOAD statements are required.

I believe the "duckdb_over_https_or_s3" is outdated.

Now, there is also something that gets me confused.

I can confirm from experience that the default duckdb behavior for "autoinstall" and "autoload" is the one documented

Testing settings on a fresh duckdb install If I start a fresh duckdb install e.g. from a docker container e.g. `docker run --rm -it ubuntu:22.04`. I install `duckdb` with ``` $ apt-get -qq update && apt-get install -y wget zip && wget https://github.com/duckdb/duckdb/releases/download/v1.0.0/duckdb_cli-linux-aarch64.zip && unzip duckdb_cli-linux-aarch64.zip && ./duckdb ... root@b41e8ca80c44:/# ./duckdb v1.0.0 1f98600c2c Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. ``` If I check the default settings and try to read an s3 object: ``` D select current_setting('autoinstall_known_extensions'), current_setting('autoload_known_extensions'); ┌─────────────────────────────────────────────────┬──────────────────────────────────────────────┐ │ current_setting('autoinstall_known_extensions') │ current_setting('autoload_known_extensions') │ │ boolean │ boolean │ ├─────────────────────────────────────────────────┼──────────────────────────────────────────────┤ │ true │ true │ └─────────────────────────────────────────────────┴──────────────────────────────────────────────┘ D select extension_name, installed, loaded from duckdb_extensions() where extension_name = 'httpfs'; ┌────────────────┬───────────┬─────────┐ │ extension_name │ installed │ loaded │ │ varchar │ boolean │ boolean │ ├────────────────┼───────────┼─────────┤ │ httpfs │ false │ false │ └────────────────┴───────────┴─────────┘ D describe select * from 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv'; ┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ ├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤ │ column0 │ VARCHAR │ YES │ │ │ │ │ column1 │ BIGINT │ YES │ │ │ │ │ column2 │ BIGINT │ YES │ │ │ │ │ column3 │ DOUBLE │ YES │ │ │ │ │ column4 │ DATE │ YES │ │ │ │ └─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘ D select extension_name, installed, loaded from duckdb_extensions() where extension_name = 'httpfs'; ┌────────────────┬───────────┬─────────┐ │ extension_name │ installed │ loaded │ │ varchar │ boolean │ boolean │ ├────────────────┼───────────┼─────────┤ │ httpfs │ true │ true │ └────────────────┴───────────┴─────────┘ ``` The autoinstall and autoload for the `https` extension have just worked.

I confirm that the python extension works as well.

The go-duckdb (tested on the latest build) client however requires me to set the settings explicitly e.g.

2024/06/05 15:08:27 exec bootquery "SET s3_endpoint = 's3.us-east-1.amazonaws.com'": Catalog Error: Setting with name "s3_endpoint" is not in the catalog, but it exists in the httpfs extension.

Please try installing and loading the httpfs extension by running:
INSTALL httpfs;
LOAD httpfs;

Alternatively, consider enabling auto-install and auto-load by running:
SET autoinstall_known_extensions=1;
SET autoload_known_extensions=1;

Would you have an idea why that is the case? This confuses me as I expect that the defaults are baked into the libduckdb.a so I would expect that I should see the same results independently from the language I use.

Thanks a lot and congrats for the 1.0.0 release :)

carlopi commented 6 months ago

Some context, in particular around go-duckdb, is here: https://github.com/marcboeker/go-duckdb/issues/228

szarnyasg commented 5 months ago

The deprecated auto-loadable comment in duckdb_ver_https_or_s3 was resolved via a8fda1e.

marchelbling-aqemia commented 5 months ago

It seems I cannot close the issue myself yet I think we're done here. Thank you!