duckdb / duckdb-node

MIT License
57 stars 27 forks source link

Auto Install/Load extensions is causing schema inference discrepancies #124

Closed rahulj51 closed 1 month ago

rahulj51 commented 1 month ago

We use Duckdb for inferring Parquet file schemas. The latest update (1.1.1) has changed the default behavior of parquet schema inference if spatial extension is installed. The problem is that the nodejs module automatically installs/loads the extensions. Is there a way to disable/override this behavior? We do not want spatial extension to be installed. I see that the DUCKDB_EXTENSION_AUTOINSTALL_DEFAULT flag is set by default. Is there a way to unset it (maybe using an environment variable) ?

freakone commented 1 month ago

Additionally, even if we use the SQL to disable the configuration options, spatial extension gets loaded anyways.

This is after we start the DB:

[
  {
    extension_name: 'spatial',
    installed: true,
    loaded: false,
    description: 'Geospatial extension that adds support for working with spatial data and functions'
  }
]

then we set the configuration params and confirm:

[ { "current_setting('autoload_known_extensions')": false } ]
[ { "current_setting('autoinstall_known_extensions')": false } ]

then we read the parquet file:

DESCRIBE select * from read_parquet([${files}],...

and after that:

[
  {
    extension_name: 'spatial',
    installed: true,
    loaded: true,
    description: 'Geospatial extension that adds support for working with spatial data and functions'
  }
]
carlopi commented 1 month ago

This is a problem in the DuckDB side, to be fixed there, where the autoloading in the parquet extensions do not properly checks configuration (and the API around that is not helping avoiding problems, to be improved).

I am not aware of any proper workaround in the current version, barred something weird / heavy handed like placing a file that needs admin privileges to be read at ~/.duckdb/extensions/v1.1.1/PLATFORM_NAME/spatial.duckdb_extension (then I would expect LOAD to fail silently).

Maxxen commented 1 month ago

The spatial extension will no longer be auto-loaded when reading parquet, additionally there is new option

SET enable_geoparquet_conversion = false

you can set to disable geoparquet conversion even when spatial is loaded.