Closed mehd-io closed 2 weeks ago
It sounds like autoinstall_known_extensions
is perhaps not set?
Can you provide the output for select * from duckdb_settings()
, I think that will be relevant here
@Tishj it is set, I'm using other extensions (httpfs, aws) and often rely on autoload.
v0.10.2 1601d94f94
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D select * from duckdb_settings() where name='autoinstall_known_extensions';
┌──────────────────────────────┬─────────┬─────────────────────────────────────────────────────────────────────────────────────────────────┬────────────┬─────────┐
│ name │ value │ description │ input_type │ scope │
│ varchar │ varchar │ varchar │ varchar │ varchar │
├──────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┼─────────┤
│ autoinstall_known_extensions │ true │ Whether known extensions are allowed to be automatically installed when a query depends on them │ BOOLEAN │ GLOBAL │
└──────────────────────────────┴─────────┴─────────────────────────────────────────────────────────────────────────────────────────────────┴────────────┴─────────┘
D create or replace table oslo as select bf_source, confidence, area_in_meters, country_iso, st_geomfromwkb(geometry) as geom from 's3://us-west-2.opendata.source.coop/vida/google-microsoft-open-buildings/geoparquet/by_country/country_iso=NOR/NOR.parquet' where st_dwithin(st_geomfromwkb(geometry), st_point(10.7409424, 59.9135533), .1);
Catalog Error: Scalar Function with name "st_dwithin" is not in the catalog, but it exists in the spatial extension.
Please try installing and loading the spatial extension:
INSTALL spatial;
LOAD spatial;
I think everything is working as intended, there are these lines:
static constexpr const char *AUTOLOADABLE_EXTENSIONS[] = {
"aws", "azure", "autocomplete", "excel", "fts", "httpfs", "inet",
"icu", "json", "parquet", "sqlite_scanner", "sqlsmith", "postgres_scanner", "tpcds",
"tpch"}; // END_OF_AUTOLOADABLE_EXTENSIONS
at https://github.com/duckdb/duckdb/blob/v0.10.2/src/include/duckdb/main/extension_entries.hpp#L351C1-L355C1 that specify the list of extensions for which autoloading will actually work.
In current main, the line is here: https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/main/extension_entries.hpp#L366C1-L370C1, but still no iceberg nor spatial.
Idea of the current code is that there are two sets: the set of known functions , and the set of extensions that can be autoloaded.
To trigger autoloading a function has to be known (e.g., not added in a later extension version) AND the extension has to be in the list. [edit: also, autoloading needs to be enabled, but that's the case in most stable deployments]
Strictly speaking I dont' see any bug in the implementation, but if it was not clear to you (as someone working with duckdb) then probably there is likely room for improvement in the error messages or [developer] docs or elsewhere.
Expanding the list of known to be autoloadable extensions can be considered.
Thanks for the explanation, @carlopi , It's much clearer! I think from an user point of view, this is pretty confusing as there are today no way of knowing if an extension will autoload unless you look at this code source. So, at the minimum, as you suggested, the list should probably be referenced in the docs.
Regarding the list itself, what's the decision behind not having all duckdb supported extensions auto-loaded? That was my initial thought. As the philosophy of making things simpler, most of the (basic/new) users shouldn't care about the extensions as they would most of the time use supported extensions.
Rationale is that extension that currently are marked as autoloadable are tested so that if LOAD x
statement are ignored, and all keeps working, so basically LOAD statements are superfluous.
Spatial I think still needs to be explicitly loaded in a few cases, say when reading a DB file that store spatial data autoloading can't yet be triggered there (I think, or some other cases like it).
We can change this policy (and change the test infrastructure a bit to reflect that, basically adding in a few places that test do not support the full-autoloading mode), I can also see the reason for more autoloading.
Doc side the difference is sort of explained here: https://duckdb.org/docs/extensions/overview#extension-types, where there are this 3 categories: built-in, autoloadable, explicitly loadable extensions, but text can be improved, input is very welcome. (this is orthogonal to whether spatial is autoloadable or not, given there will always exist third party extensions that requires explicit load)
I think this will be reviewed docs side in https://github.com/duckdb/duckdb-web/issues/2874.
Going forward we might want to move more core extensions to being autoloadable, but this is currently working as intended and so closing this.
Possibly it might be raised on the different repos, or more generally a discussion can also be great.
What happens?
Spatial and Iceberg extensions should auto-load as their functions are in the extension_entries.hpp but they aren't.
To Reproduce
On DuckDB 10.2 CLI
For Spatial extension (data is from a public bucket),
st_geomfromwkb
is listed in the extensions_entries.hppFor Iceberg,
iceberg_scan
is listed in the extensions_entries.hppduckdb_settings ()
OS:
MacOS
DuckDB Version:
0.10.2
DuckDB Client:
CLI
Full Name:
Mehdi Ouazza
Affiliation:
MotherDuck
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have not tested with any build
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?