StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.29k stars 1.68k forks source link

after insert overwrite,two tables have inconsistent data #38170

Open medivh666 opened 6 months ago

medivh666 commented 6 months ago

Version:3.2.1-79ee91d Shared-data deployment

The steps are as follows:

CREATE TABLE tb1 ( id varchar(128) NOT NULL COMMENT "", create_date date NOT NULL COMMENT "", user_id int(11) NOT NULL DEFAULT "0" COMMENT "", amount int(11) NOT NULL DEFAULT "0" COMMENT "", is_deleted int(11) NOT NULL DEFAULT "0" COMMENT "", status int(11) NOT NULL DEFAULT "0" COMMENT "" ) ENGINE=OLAP PRIMARY KEY(id, create_date, user_id) COMMENT "OLAP" PARTITION BY date_trunc('day', create_date) DISTRIBUTED BY HASH(user_id) PROPERTIES ( "replication_num" = "1", "datacache.enable" = "true", "storage_volume" = "builtin_storage_volume", "enable_async_write_back" = "false", "enable_persistent_index" = "true", "persistent_index_type" = "LOCAL", "compression" = "LZ4" );

//insert some data... 3000000+rows

CREATE TABLE tb2 ( create_date date NOT NULL COMMENT "", user_id int(11) NOT NULL DEFAULT "0" COMMENT "", is_deleted int(11) NOT NULL DEFAULT "0" COMMENT "", row_count int(11) NOT NULL DEFAULT "0" COMMENT "" ) ENGINE=OLAP PRIMARY KEY(create_date, user_id, is_deleted) COMMENT "OLAP" PARTITION BY date_trunc('month', create_date) DISTRIBUTED BY HASH(user_id) PROPERTIES ( "replication_num" = "1", "datacache.enable" = "true", "storage_volume" = "builtin_storage_volume", "enable_async_write_back" = "false", "enable_persistent_index" = "true", "persistent_index_type" = "LOCAL", "compression" = "LZ4" );

insert overwrite tb2 select create_date,user_id,is_deleted,count(*) as row_count from tb1 where create_date='2023-12-29' group by create_date,user_id,is_deleted;

The data is inconsistent when the amount of data is large (300w rows) select count(*) from tb1 where create_date='2023-12-29' ;
select sum(row_count) from tb2 where create_date='2023-12-29' ;

dbf32d6eeb09e4e72f369c9a6930c8b
github-actions[bot] commented 4 days ago

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!