Closed ShiKaiWi closed 1 year ago
Actually, I can't reproduce this error. After digging into the codebase, I find it the following points:
Maybe this problem is caused by altering schema?
@ShiKaiWi I reproduce this error.
curl --location --request POST 'http://127.0.0.1:5000/sql' \
--header 'Content-Type: application/json' \
-H 'x-ceresdb-access-tenant: test' \
--data-raw '{
"query": "CREATE TABLE `demo` (`name` string TAG NULL, `value` double NOT NULL, `t` timestamp NOT NULL, TIMESTAMP KEY(t)) ENGINE=Analytic with (enable_ttl='\''false'\'')"
}'
The table schema is :
CREATE TABLE `demo` (`t` timestamp NOT NULL, `tsid` uint64 NOT NULL, `name` string TAG, `value` double NOT NULL, PRIMARY KEY(t,tsid), TIMESTAMP KEY(t)) ENGINE=Analytic WITH(arena_block_size='2097152', compaction_strategy='default', compression='ZSTD', enable_ttl='false', num_rows_per_row_group='8192', segment_duration='', storage_format='COLUMNAR', ttl='7d', update_mode='OVERWRITE', write_buffer_size='33554432')"
name
is nullcurl --location --request POST 'http://127.0.0.1:5000/sql' \
--header 'Content-Type: application/json' \
-H 'x-ceresdb-access-tenant: test' \
--data-raw '{
"query": "INSERT INTO demo(t, value) VALUES(1651737067000, 100)"
}'
curl --location --request POST 'http://127.0.0.1:5000/sql' \
--header 'Content-Type: application/json' \
--header 'x-ceresdb-access-tenant: test' \
--data-raw '{
"query": "select `t`, count(distinct name) from demo group by `t`"
}'
Here is the stacktrace:
ERRO [common_util/src/panic.rs:42] thread 'ceres-bg' panicked 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Column 'COUNT(DISTINCT demo.name)[count distinct]' is declared as non-nullable but contains null values")' at "/Users/michael/.cargo/git/checkouts/arrow-datafusion-b9eb4f789f8bda1f/d84ea9c/datafusion/core/src/physical_plan/repartition.rs:178"
0: backtrace::backtrace::libunwind::trace
at /Users/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/backtrace/mod.rs:66:5
backtrace::backtrace::trace_unsynchronized
at /Users/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/backtrace/mod.rs:66:5
backtrace::backtrace::trace
at /Users/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/backtrace/mod.rs:53:14
backtrace::capture::Backtrace::create
at /Users/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/capture.rs:176:9
backtrace::capture::Backtrace::new
I submit a bug issue:https://github.com/apache/arrow-datafusion/issues/4040 for datafusion.
I try to reproduce this problem with this file: https://github.com/apache/arrow-datafusion/blob/97b3a4b37f54aaa52f8705db3e57b15ee98c24a7/datafusion-examples/examples/memtable.rs#L39
Changes:
1 file changed, 6 insertions(+), 8 deletions(-)
datafusion-examples/examples/memtable.rs | 14 ++++++--------
modified datafusion-examples/examples/memtable.rs
@@ -36,14 +36,12 @@ async fn main() -> Result<()> {
// Register the in-memory table containing the data
ctx.register_table("users", Arc::new(mem_table))?;
- let dataframe = ctx.sql("SELECT * FROM users;").await?;
+ let dataframe = ctx
+ .sql("SELECT id,count(distinct bank_account) From users group by id;")
+ .await?;
timeout(Duration::from_secs(10), async move {
- let result = dataframe.collect().await.unwrap();
- let record_batch = result.get(0).unwrap();
-
- assert_eq!(1, record_batch.column(0).len());
- dbg!(record_batch.columns());
+ dataframe.show().await.unwrap();
})
.await
.unwrap();
@@ -56,8 +54,8 @@ fn create_memtable() -> Result<MemTable> {
}
fn create_record_batch() -> Result<RecordBatch> {
- let id_array = UInt8Array::from(vec![1]);
- let account_array = UInt64Array::from(vec![9000]);
+ let id_array = UInt8Array::from(vec![1, 2]);
+ let account_array = UInt64Array::from(vec![None, Some(1)]);
Ok(RecordBatch::try_new(
get_schema(),
Then execute this demo with cargo run --example memtable
, will output
+----+------------------------------------+
| id | COUNT(DISTINCT users.bank_account) |
+----+------------------------------------+
| 2 | 1 |
| 1 | 0 |
+----+------------------------------------+
It works without panic, maybe we need to narrow this problem down, to check if it's our usage issue or upstream issue.
Describe this problem
It seems there is no check for insert a null value of a not-null column now, and the datafusion may panic when processing such case. Here is the stacktrace:
The table schema is:
Steps to reproduce
Expected behavior
It should not panic.
Additional Information
At least two things we need to fix: