Closed CGenie closed 1 month ago
Hello! Have you tried setting a memory limit? All memory allocated by the R-Tree should be tracked by DuckDB and respect the memory limit parameter.
Are you running with a disk-backed database file or entirely in memory? You may want to have a look at the spilling-to-disk section as well.
Well, with memory_limit
set to 10G
I get:
D set memory_limit = '10G';
D load spatial;
D CREATE INDEX nodes_pt_idx ON nodes USING RTREE (pt);
33% ▕███████████████████▊ ▏ INTERNAL Error: Attempted to dereference unique_ptr that is NULL!
This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic.
For more information, see https://duckdb.org/docs/dev/internal_errors
I'll try disk spilling. The DB is a file, not in memory.
OK I get the same error with disk spilling (i.e `SET tmp_dir = '...').
Im unable to reproduce this on DuckDB v1.1.0 MacOS in disk-backed mode (although without a memory limit).
load spatial;
D CREATE TABLE nodes AS SELECT * FROM ST_ReadOSM('poland-latest.osm.pbf') WHERE kind = 'node';
100% ▕████████████████████████████████████████████████████████████▏
D ALTER TABLE nodes ADD COLUMN pt GEOMETRY;
D UPDATE nodes SET pt = ST_Point2D(lat, lon);
100% ▕████████████████████████████████████████████████████████████▏
D CREATE INDEX nodes_pt_idx ON nodes USING RTREE (pt);
100% ▕████████████████████████████████████████████████████████████▏
D
Ill see if I can mimic your setup and reproduce the error.
Hm, ok, it seems to have finished just fine with memory limit as 15G.
I'm on Linux btw.
If I set a 10gb limit I get a proper out-of-memory error before I even get to the index creation.
D SET memory_limit = '10gb';
D load spatial;
D CREATE TABLE nodes AS SELECT * FROM ST_ReadOSM('poland-latest.osm.pbf') WHERE kind = 'node';
100% ▕████████████████████████████████████████████████████████████▏
D ALTER TABLE nodes ADD COLUMN pt GEOMETRY;
D UPDATE nodes SET pt = ST_Point2D(lat, lon);
73% ▕███████████████████████████████████████████▊ ▏ Out of Memory Error: failed to pin block of size 256.0 KiB (9.3 GiB/9.3 GiB used)
Nonetheless, a INTERNAL Error
is never good, I'll look into setting up a linux box to dive into this further tomorrow.
Yes, I guess even when you set memory_limit = '1G'
it should still finish, though maybe it could take more time.
Not necessarily, the index itself must still be able to fit in memory. With the 217 708 491
nodes in the Poland extract you're looking at 217708491 * 40b
= ~8.7Gb
of raw index buffers, not including any additional bookkeeping structures in DuckDB.
Hm, ok, but shouldn't it handle data larger than RAM with temp disk space?
Unfortunately, this is currently a limitation of all indexes in DuckDB. While they will be lazily loaded from disk once you open a database, they will not unload from memory, which is why you need to be able to keep the entire index in memory when you first create it. That said, their memory is still tracked by DuckDB, meaning that they should respect the SET memory_limit
, option, and DuckDB will attempt to unload other temporary memory to make space for them.
I've pushed a fix for the INTERNAL Error: Attempted to dereference unique_ptr that is NULL
error in #420
I downloaded the OSM PBF file for Poland from here: https://download.geofabrik.de/europe.html
I created a duckdb table like this:
My DB size is around 6.5 GB. Now I try to do:
I see a progress bar, it goes up to around 28%, then I get a
killed
message, probably because Linux saw the process uses too much memory. Indeed, it raises to 18GB when looking at htop at the same time (I have 32 GB of RAM). I thought DuckDB is able to handle tasks larger than memory size?I'm using Duckdb v 1.1.1.
BTW, it seems I get the same thing with creating a full-text-search index (https://duckdb.org/docs/extensions/full_text_search.html), maybe it's an issue when creating an index overall?