StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.74k stars 1.75k forks source link

Issue with Hyphenated client_id Values in Partition Names #43806

Open Peschido opened 5 months ago

Peschido commented 5 months ago

Describe the bug

I’ve encountered an issue with partitioning in StarRocks (version 3.2) that seems to involve handling hyphenated client_id values. I was hoping to get some insights or solutions from anyone who might have faced a similar issue.

Steps to Reproduce the Behavior:

Create a table partitioned by client_id: CREATE TABLE test.sr_ice5 ( client_id string not null DEFAULT 'unknown', item_id string DEFAULT 'unknown', ts bigint DEFAULT 1, itemtype string DEFAULT 'unknown', labels string, has_z_info string, z_id string ) PARTITION BY (client_id) PROPERTIES ("enable_persistent_index" = "true");

Insert data into test.sr_ice5, which includes rows with client_id values both with and without hyphens (e.g., “fruit-tree” and “fruittree”). Observe the error when inserting rows with a hyphenated client_id: Error: The row create partition failed since OK. Row: ['fruittree', 'xxxx', yyy, 'zzz']

4.Checking partitions using SHOW PARTITIONS reveals that the partition name for “fruit-tree” appears without the hyphen.

Expected Behavior:

Partitions are created accurately reflecting the hyphenated and non-hyphenated client_id values, without omitting hyphens from partition names. Two partitions should be created:

one with name pfruittree with corresponding List =((‘fruittree’)) and another one with name pfruit-tree with corresponding List =((‘fruit-tree’)) Real Behavior: The system seems to omit hyphens from partition names, leading to errors when inserting data rows that correspond to the hyphenated client_id values. The insert into statement was not successful.

To Reproduce

CREATE TABLE test.sr_icetest ( client_id STRING NOT NULL DEFAULT ‘unknown’, item_id STRING DEFAULT ‘unknown’, ts BIGINT DEFAULT “1”, itemtype STRING DEFAULT ‘unknown’, labels STRING, has_z_info STRING, z_id STRING ) PARTITION BY (client_id) PROPERTIES ( “enable_persistent_index” = “true” );

INSERT INTO test.sr_icetest (client_id, item_id, ts, itemtype, labels, has_z_info, z_id) VALUES (‘fruit-tree’, ‘001’, 1622548800, ‘typeA’, ‘label1’, ‘yes’, ‘z001’);

INSERT INTO test.sr_icetest (client_id, item_id, ts, itemtype, labels, has_z_info, z_id) VALUES (‘fruittree’, ‘002’, 1622635200, ‘typeB’, ‘label2’, ‘no’, ‘z002’);

Version and Forum link

https://forum.starrocks.io/t/issue-with-hyphenated-client-id-values-in-partition-names/328/3

wyb commented 5 months ago

This is a bug. The partitions name generated by "fruit-tree" and "fruittree" are the same, resulting in the latter partition not being successfully created. I will fix it.

Peschido commented 4 months ago

Thanks for taking it on! We are desperately waiting for it.