duckdb / duckdb

DuckDB is an analytical in-process SQL database management system
http://www.duckdb.org
MIT License
21.02k stars 1.7k forks source link

Appender assert error on very big maps and large number of rows #11502

Open solomspd opened 3 months ago

solomspd commented 3 months ago

What happens?

When inserting, using the appender API, more than 12110447 rows that have a map greater than a certain size, the following assert error appears:

duckdb/src/storage/table/column_segment.cpp:112: void duckdb::ColumnSegment::Scan(duckdb::ColumnScanState&, duckdb::idx_t, duckdb::Vector&, duckdb::idx_t, bool): Assertion `result.GetVectorType() == VectorType::FLAT_VECTOR' failed.

The error occurs under these conditions:

I have tried flushing the appender intermittently or after every single insertion, but the error still happens

To Reproduce

To reproduce the error, we need to use the appender API. for this purpose, I'm using the rust bindings

Create a cargo project and add duckdb

cargo init --bin
cargo add duckdb --features bundled

We are inserting the map as a string for brevity and since the rust bindings do not natively support map bindings yet Place the rust code in main.rs to produce this error:

use duckdb::{params, Connection};

fn main() {
    let conn = Connection::open("test.db").unwrap();
    conn.execute_batch(
        "
    CREATE TABLE map_table(
        map MAP(VARCHAR, VARCHAR),
    );
    ",
    )
    .unwrap();
    let mut appender = conn.appender("map_table").unwrap();
    let max = 12110458;
    for _ in 0..max {
        let _ = appender.append_row(params![
                // map
                "{'key1'='value1','key2'='value2','key3'='value3','key4'='value4','key5'='value5','key6'='value6','key7'='value7','key8'='value8','key9'='value9','key10'='value10','key11'='value11','key12'='value12','key13'='value13','key14'='value14','key15'='value15','key16'='value16','key17'='value17','key18'='value18','key19'='value19','key20'='value20','key21'='value21','key22'='value22','key23'='value23','key24'='value24','key25'='value25','key26'='value26'}"
            ]);
    }
    appender.flush();
}

Run the code with

cargo run --release

This will produce the following error:

duckdb/src/storage/table/column_segment.cpp:112: void duckdb::ColumnSegment::Scan(duckdb::ColumnScanState&, duckdb::idx_t, duckdb::Vector&, duckdb::idx_t, bool): Assertion `result.GetVectorType() == VectorType::FLAT_VECTOR' failed.

OS:

Arch Linux, x64

DuckDB Version:

0.10.1

DuckDB Client:

Rust

Full Name:

Abdelsalam ElTamawy

Affiliation:

the American University in Cairo

Have you tried this on the latest nightly build?

I have tested with a release build (and could not test with a nightly build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

szarnyasg commented 3 months ago

Hi @solomspd, thanks for the detailed issue report. Unfortunately, I was not yet able to reproduce the issue. On an AWS EC2 instance, I get

$ uname -a
Linux ip-172-31-1-25 6.5.0-1014-aws #14~22.04.1-Ubuntu SMP Thu Feb 15 15:27:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ cargo run --release
...
    Finished release [optimized] target(s) in 2m 02s
     Running `target/release/iss11502`
Hello, world!
solomspd commented 3 months ago

Thank you for the prompt response, @szarnyasg.

I've been doing some following up testing on both on-prem and cloud machines and I've been able to reliably reproduce it, even spun up an ubuntu EC2 t2.xlarge instance to try to create an environment similar to the one you're testing in, but I'm still seeing the error.

I have also confirmed the error is reproduced on both release and debug builds.

I've reproduced the error on on the following machines I have on hand:

Arch linux

$ uname -a
Linux solom-pc 6.8.2-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 28 Mar 2024 17:06:35 +0000 x86_64 GNU/Linux
$ rustc --version
rustc 1.77.1
$  cat /proc/cpuinfo | grep 'model name'
Ryzen 9 7950X3D

NixOS 24.05

$ uname -a
Linux solom-framework 6.8.3 #1-NixOS SMP PREEMPT_DYNAMIC Wed Apr  3 13:32:51 UTC 2024 x86_64 GNU/Linux
$ rustc --version
rustc 1.76.0
$ cat /proc/cpuinfo | grep 'model name'
i7-1260P

I've also reproduced it on the following cloud machines

Linode Premium CPU Debian 11 VPS

$ uname -a
Linux localhost 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux
$ rustc --version
rustc 1.77.1
$  cat /proc/cpuinfo | grep 'model name'
AMD EPYC 7713

AWS EC2 t2.xlarge Ubuntu

$ uname -a
Linux ip-172-31-7-221 6.5.0-1014-aws #14~22.04.1-Ubuntu SMP Thu Feb 15 15:27:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ rustc --version
rustc 1.75.0
$  cat /proc/cpuinfo | grep 'model name'
Xeon(R) CPU E5-2686 v4

Can you perhaps try to setup a clean EC2 environment and see if you can reproduce it on your end with the code snipped mentioned? I've made sure it does happen on every configuration I tried.

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.