StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.94k stars 1.8k forks source link

[Bug] mv based on jdbc catalog(MySQL) tables cannot be refreshed but information_schema.task_runs shows the mv is refreshed successfully #37714

Closed trikker closed 4 months ago

trikker commented 10 months ago

Steps to reproduce the behavior (Required)

MySQL create table and insert records:

root@testdb Mon Dec 25 15:10:00 2023>CREATE TABLE goods(
item_id1          INT,
item_name         varchar(50),
price             FLOAT,
primary key(item_id1)
);
Query OK, 0 rows affected (0.00 sec)

root@testdb Mon Dec 25 15:10:05 2023>INSERT INTO goods
VALUES
(1001,"apple",6.5),
(1002,"pear",8.0),
(1003,"potato",2.2);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

root@testdb Mon Dec 25 15:10:20 2023>CREATE TABLE order_list(
order_id          INT,
client_id         INT,
item_id2          INT,
order_date        DATE,
primary key(order_id, client_id, item_id2)
) ;
Query OK, 0 rows affected (0.01 sec)

root@testdb Mon Dec 25 15:10:32 2023>INSERT INTO order_list
VALUES
(10001,101,1001,"2022-03-13"),
(10001,101,1002,"2022-03-13"),
(10002,103,1002,"2022-03-13"),
(10002,103,1003,"2022-03-14"),
(10003,102,1003,"2022-03-14"),
(10003,102,1001,"2022-03-14");
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

root@testdb Mon Dec 25 15:10:52 2023>SELECT
order_id,
sum(goods.price) as total
FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
GROUP BY order_id;
+----------+--------------------+
| order_id | total              |
+----------+--------------------+
|    10001 |               14.5 |
|    10002 | 10.200000047683716 |
|    10003 |  8.700000047683716 |
+----------+--------------------+
3 rows in set (0.01 sec)

starrocks create jdbc catalog:

mysql> CREATE EXTERNAL CATALOG jdbc1
    -> PROPERTIES
    -> (
    ->     "type"="jdbc",
    ->     "user"="root",
    ->     "password"="<MYSQL_PASSWORD>",
    ->     "jdbc_uri"="jdbc:mysql://<MYSQL_HOST>:3456",
    ->     "driver_url"="https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.28/mysql-connector-java-8.0.28.jar",
    ->     "driver_class"="com.mysql.cj.jdbc.Driver"
    -> );
Query OK, 0 rows affected (2.02 sec)

mysql> show databases from jdbc1;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mytest             |
| sbtest             |
| testdb             |
+--------------------+
4 rows in set (0.53 sec)

starrocks create mv based on the previous MySQL tables:

mysql> CREATE MATERIALIZED VIEW jdbc1_order_mv
DISTRIBUTED BY HASH(`order_id`)
REFRESH ASYNC START('2023-12-25 11:30:00') EVERY (interval 1 minute)
AS SELECT
l.order_id,
sum(g.price) as total
FROM jdbc1.testdb.order_list l INNER JOIN jdbc1.testdb.goods g ON g.item_id1 = l.item_id2
GROUP BY l.order_id;
Query OK, 0 rows affected (1.12 sec)

mysql> select * from jdbc1_order_mv;
+----------+--------------------+
| order_id | total              |
+----------+--------------------+
|    10001 |               14.5 |
|    10002 | 10.200000047683716 |
|    10003 |  8.700000047683716 |
+----------+--------------------+
3 rows in set (0.01 sec)

# 看SQL是能走物化视图,这里看是能走的
mysql> explain SELECT
    -> l.order_id,
    -> sum(g.price) as total
    -> FROM jdbc1.testdb.order_list l INNER JOIN jdbc1.testdb.goods g ON g.item_id1 = l.item_id2
    -> GROUP BY l.order_id;
+------------------------------------+
| Explain String                     |
+------------------------------------+
| PLAN FRAGMENT 0                    |
|  OUTPUT EXPRS:1: order_id | 8: sum |
|   PARTITION: UNPARTITIONED         |
|                                    |
|   RESULT SINK                      |
|                                    |
|   2:EXCHANGE                       |
|                                    |
| PLAN FRAGMENT 1                    |
|  OUTPUT EXPRS:                     |
|   PARTITION: RANDOM                |
|                                    |
|   STREAM DATA SINK                 |
|     EXCHANGE ID: 02                |
|     UNPARTITIONED                  |
|                                    |
|   1:Project                        |
|   |  <slot 1> : 9: order_id        |
|   |  <slot 8> : 10: total          |
|   |                                |
|   0:OlapScanNode                   |
|      TABLE: jdbc1_order_mv         |
|      PREAGGREGATION: ON            |
|      partitions=1/1                |
|      rollup: jdbc1_order_mv        |
|      tabletRatio=2/2               |
|      tabletList=11674,11676        |
|      cardinality=4500              |
|      avgRowSize=2.0                |
|      MaterializedView: true        |
+------------------------------------+
30 rows in set (0.74 sec)

Above is all ok, then after I inserted new records to MySQL, I found starrocks cannot refresh mv.

MySQL inserted new records:

root@testdb Mon Dec 25 15:14:19 2023>insert into order_list values (10004,104,1004,"2022-03-16");
Query OK, 1 row affected (0.00 sec)

root@testdb Mon Dec 25 15:14:21 2023>insert into goods values(1004, "tomato", 3.5);
Query OK, 1 row affected (0.01 sec)

root@testdb Mon Dec 25 15:14:27 2023>SELECT
       order_id,
       sum(goods.price) as total
       FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
       GROUP BY order_id;
+----------+--------------------+
| order_id | total              |
+----------+--------------------+
|    10001 |               14.5 |
|    10002 | 10.200000047683716 |
|    10003 |  8.700000047683716 |
|    10004 |                3.5 |
+----------+--------------------+
4 rows in set (0.00 sec)

Wait for a few minutes, Starrocks information_schema showed that mv has been refreshed successfully but I found the data was still old.

mysql> select * from information_schema.tasks\G
*************************** 1. row ***************************
  TASK_NAME: mv-12096
CREATE_TIME: 2023-12-25 15:11:53
   SCHEDULE: PERIODICAL (START 2023-12-25T15:11 EVERY(1 MINUTES))
   DATABASE: testdb
 DEFINITION: insert overwrite `jdbc1_order_mv` SELECT `jdbc1`.`testdb`.`l`.`order_id`, sum(`jdbc1`.`testdb`.`g`.`price`) AS `total`
FROM `jdbc1`.`testdb`.`order_list` AS `l` INNER JOIN `jdbc1`.`testdb`.`goods` AS `g` ON `jdbc1`.`testdb`.`g`.`item_id1` = `jdbc1`.`testdb`.`l`.`item_id2`
GROUP BY `jdbc1`.`testdb`.`l`.`order_id`
EXPIRE_TIME: NULL
1 row in set (0.01 sec)

mysql> select * from information_schema.task_runs where task_name='mv-12096' order by CREATE_TIME desc limit 3\G
*************************** 1. row ***************************
     QUERY_ID: df911b09-a2f5-11ee-b17d-08c0eb564686
    TASK_NAME: mv-12096
  CREATE_TIME: 2023-12-25 15:18:59
  FINISH_TIME: 2023-12-25 15:19:00
        STATE: SUCCESS
     DATABASE: testdb
   DEFINITION: insert overwrite `jdbc1_order_mv` SELECT `jdbc1`.`testdb`.`l`.`order_id`, sum(`jdbc1`.`testdb`.`g`.`price`) AS `total`
FROM `jdbc1`.`testdb`.`order_list` AS `l` INNER JOIN `jdbc1`.`testdb`.`goods` AS `g` ON `jdbc1`.`testdb`.`g`.`item_id1` = `jdbc1`.`testdb`.`l`.`item_id2`
GROUP BY `jdbc1`.`testdb`.`l`.`order_id`
  EXPIRE_TIME: 2023-12-26 15:18:59
   ERROR_CODE: 0
ERROR_MESSAGE: NULL
     PROGRESS: 100%
EXTRA_MESSAGE: {"forceRefresh":false,"mvPartitionsToRefresh":[],"refBasePartitionsToRefreshMap":{},"basePartitionsToRefreshMap":{}}
   PROPERTIES: {"mvId":"12096","replication_num":"1","in_memory":"false"}
*************************** 2. row ***************************
     QUERY_ID: bbcdd4fe-a2f5-11ee-b17d-08c0eb564686
    TASK_NAME: mv-12096
  CREATE_TIME: 2023-12-25 15:17:59
  FINISH_TIME: 2023-12-25 15:18:00
        STATE: SUCCESS
     DATABASE: testdb
   DEFINITION: insert overwrite `jdbc1_order_mv` SELECT `jdbc1`.`testdb`.`l`.`order_id`, sum(`jdbc1`.`testdb`.`g`.`price`) AS `total`
FROM `jdbc1`.`testdb`.`order_list` AS `l` INNER JOIN `jdbc1`.`testdb`.`goods` AS `g` ON `jdbc1`.`testdb`.`g`.`item_id1` = `jdbc1`.`testdb`.`l`.`item_id2`
GROUP BY `jdbc1`.`testdb`.`l`.`order_id`
  EXPIRE_TIME: 2023-12-26 15:17:59
   ERROR_CODE: 0
ERROR_MESSAGE: NULL
     PROGRESS: 100%
EXTRA_MESSAGE: {"forceRefresh":false,"mvPartitionsToRefresh":[],"refBasePartitionsToRefreshMap":{},"basePartitionsToRefreshMap":{}}
   PROPERTIES: {"mvId":"12096","replication_num":"1","in_memory":"false"}
*************************** 3. row ***************************
     QUERY_ID: 980a8ef7-a2f5-11ee-b17d-08c0eb564686
    TASK_NAME: mv-12096
  CREATE_TIME: 2023-12-25 15:16:59
  FINISH_TIME: 2023-12-25 15:17:00
        STATE: SUCCESS
     DATABASE: testdb
   DEFINITION: insert overwrite `jdbc1_order_mv` SELECT `jdbc1`.`testdb`.`l`.`order_id`, sum(`jdbc1`.`testdb`.`g`.`price`) AS `total`
FROM `jdbc1`.`testdb`.`order_list` AS `l` INNER JOIN `jdbc1`.`testdb`.`goods` AS `g` ON `jdbc1`.`testdb`.`g`.`item_id1` = `jdbc1`.`testdb`.`l`.`item_id2`
GROUP BY `jdbc1`.`testdb`.`l`.`order_id`
  EXPIRE_TIME: 2023-12-26 15:16:59
   ERROR_CODE: 0
ERROR_MESSAGE: NULL
     PROGRESS: 100%
EXTRA_MESSAGE: {"forceRefresh":false,"mvPartitionsToRefresh":[],"refBasePartitionsToRefreshMap":{},"basePartitionsToRefreshMap":{}}
   PROPERTIES: {"mvId":"12096","replication_num":"1","in_memory":"false"}
3 rows in set (0.03 sec)

mysql> select * from jdbc1_order_mv;
+----------+--------------------+
| order_id | total              |
+----------+--------------------+
|    10003 |  8.700000047683716 |
|    10001 |               14.5 |
|    10002 | 10.200000047683716 |
+----------+--------------------+
3 rows in set (0.01 sec)

Even if I refreshed the mv manually, it's data is still old.

mysql> REFRESH MATERIALIZED VIEW jdbc1_order_mv WITH SYNC MODE;
+--------------------------------------+
| QUERY_ID                             |
+--------------------------------------+
| 06fe7d5a-a2f6-11ee-b17d-08c0eb564686 |
+--------------------------------------+
1 row in set (0.81 sec)

mysql> select * from jdbc1_order_mv;
+----------+--------------------+
| order_id | total              |
+----------+--------------------+
|    10003 |  8.700000047683716 |
|    10001 |               14.5 |
|    10002 | 10.200000047683716 |
+----------+--------------------+
3 rows in set (0.01 sec)

Query MySQL again:

root@testdb Mon Dec 25 15:21:02 2023>SELECT        order_id,        sum(goods.price) as total        FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2        GROUP BY order_id;      +----------+--------------------+
| order_id | total              |
+----------+--------------------+
|    10001 |               14.5 |
|    10002 | 10.200000047683716 |
|    10003 |  8.700000047683716 |
|    10004 |                3.5 |
+----------+--------------------+
4 rows in set (0.01 sec)

The query SQL in starrocks returned old data too because it has been rewrote to use the mv.

mysql> SELECT `jdbc1`.`testdb`.`l`.`order_id`, sum(`jdbc1`.`testdb`.`g`.`price`) AS `total`
    -> FROM `jdbc1`.`testdb`.`order_list` AS `l` INNER JOIN `jdbc1`.`testdb`.`goods` AS `g` ON `jdbc1`.`testdb`.`g`.`item_id1` = `jdbc1`.`testdb`.`l`.`item_id2`
    -> GROUP BY `jdbc1`.`testdb`.`l`.`order_id`;
+----------+--------------------+
| order_id | total              |
+----------+--------------------+
|    10001 |               14.5 |
|    10002 | 10.200000047683716 |
|    10003 |  8.700000047683716 |
+----------+--------------------+
3 rows in set (0.75 sec)

After I drop the mv, the query returns the correct data.

mysql> drop materialized view jdbc1_order_mv;
Query OK, 0 rows affected (0.13 sec)

mysql> SELECT
jdbc1.testdb.order_list.order_id,
sum(jdbc1.testdb.goods.price) as total
FROM jdbc1.testdb.order_list INNER JOIN jdbc1.testdb.goods ON jdbc1.testdb.goods.item_id1 = jdbc1.testdb.order_list.item_id2
GROUP BY jdbc1.testdb.order_list.order_id;
+----------+--------------------+
| order_id | total              |
+----------+--------------------+
|    10001 |               14.5 |
|    10003 |  8.700000047683716 |
|    10004 |                3.5 |
|    10002 | 10.200000047683716 |
+----------+--------------------+
4 rows in set (0.46 sec)

Expected behavior (Required)

The mv data should be refreshed correctly.

Real behavior (Required)

The mv data was not refreshed actually but information_schema.task_runs showed it did.

StarRocks version (Required)

mysql> select current_version();
+-------------------+
| current_version() |
+-------------------+
| 3.2.1-79ee91d     |
+-------------------+
1 row in set (0.01 sec)
github-actions[bot] commented 4 months ago

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!