StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9k stars 1.81k forks source link

[inverted index]In English and Chinese, xx match '' returns different results. #45403

Open chengqianli-git opened 6 months ago

chengqianli-git commented 6 months ago

Steps to reproduce the behavior (Required)

  1. CREATE TABLE '...'
  2. INSERT INTO '....'
  3. SELECT '....'4. 4.
CREATE TABLE `duplicate_table_demo_datatype_not_replicated_all_varchar` (
  `AAA` datetime NOT NULL COMMENT "",
  `BBB` varchar(200) NOT NULL COMMENT "",
  `CCC` varchar(200) NOT NULL COMMENT "",
  `DDD` varchar(2000) NULL COMMENT "",
  `EEE` largeint(40) NULL COMMENT "",
  `FFF` decimal(20, 10) NULL COMMENT "",
  `GGG` varchar(200) NULL COMMENT "",
  `HHH` float NULL COMMENT "",
  `III` boolean NULL COMMENT "",
  `KKK` char(20) NULL COMMENT "",
  `LLL` varchar(65533) NULL COMMENT "",
  `MMM` varchar(20) NULL COMMENT "",
  `NNN` varbinary NULL COMMENT "",
  `OOO` tinyint(4) NULL COMMENT "",
  `PPP` datetime NULL COMMENT "",
  `QQQ` array<int(11)> NULL COMMENT "",
  `RRR` json NULL COMMENT "",
  `SSS` map<int(11),int(11)> NULL COMMENT "",
  `TTT` struct<a int(11), b int(11)> NULL COMMENT "",
  INDEX init_bitmap_index (`KKK`) USING BITMAP COMMENT '',
  INDEX idx (`DDD`) USING GIN("parser" = "chinese") COMMENT ''
) ENGINE=OLAP
DUPLICATE KEY(`AAA`, `BBB`, `CCC`)
PARTITION BY RANGE(`AAA`)
(PARTITION p1970 VALUES [("1970-01-01 00:00:00"), ("2000-01-01 00:00:00")),
PARTITION p2000 VALUES [("2000-01-01 00:00:00"), ("2030-01-01 00:00:00")))
DISTRIBUTED BY HASH(`AAA`, `BBB`) BUCKETS 3
ORDER BY(`AAA`, `BBB`, `CCC`, `DDD`)
PROPERTIES (
"bloom_filter_columns" = "MMM",
"compression" = "LZ4",
"fast_schema_evolution" = "true",
"replicated_storage" = "false",
"replication_num" = "3",
"unique_constraints" = "default_catalog.test_inverted_indexa4a35500_0e77_11ef_b07c_00163e21975a.duplicate_table_demo_datatype_not_replicated_all_varchar.GGG"
);

insert five data,
select * from duplicate_table_demo_datatype_not_replicated_all_varchar where DDD match '';

Expected behavior (Required)

Real behavior (Required)

return 1 row but if parser is English, return 0 row

StarRocks version (Required)

github-actions[bot] commented 6 days ago

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!