StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.82k stars 1.77k forks source link

Failed to use StarRocks' external table feature, reporting a null pointer exception #52031

Open HelloWorldwangjiacheng opened 3 days ago

HelloWorldwangjiacheng commented 3 days ago

中文版本

背景

  1. 目前有两个集群,A 和 B,都是存算分离模式。
  2. 集群 A 的机器为 192.168.1.67(FE follower + CN,starrocks-cn),192.168.1.139(FE leader + CN,starrocks-fe-2),和 192.168.1.148(FE Observer,starrocks-fe-4)。我启用了 FQDN。image
  3. 集群 B 的机器为 192.168.1.149(FE leader + CN)image
  4. 两个集群都可以用于包括但不限于创建数据库、创建普通表和外部表等任务。在普通表中,数据可以进行插入、删除、查询和更新(CRUD)

操作过程

  1. 我想试试 StarRocks 的外部表功能,特别是在两个 StarRocks 集群之间,看看文档中提到的实现集群间同步和读写分离是否可行。
  2. 我将在集群 B 中创建一个外部表,而在集群 A 中会有一个普通表。正如文档中提到的那样,但老实说,文档的表述有点绕,最好能有具体的指代。
  3. 集群 A 的建表语句:create database myDB; use myDB; CREATE TABLEt( idbigint(20) NOT NULL COMMENT "", warehouse_idvarchar(10) NULL COMMENT "", warehouse_namevarchar(255) NULL COMMENT "", spu_idvarchar(16) NULL COMMENT "", spu_namevarchar(255) NULL COMMENT "", sku_idbigint(20) NULL COMMENT "", sku_namevarchar(255) NULL COMMENT "", batch_idvarchar(16) NULL COMMENT "", batch_namevarchar(255) NULL COMMENT "", stock_countbigint(20) NULL COMMENT "", pricedecimal(10, 2) NULL COMMENT "", create_timedatetime NULL COMMENT "", create_namevarchar(64) NULL COMMENT "", create_bybigint(20) NULL COMMENT "", update_timedatetime NULL COMMENT "", update_namevarchar(64) NULL COMMENT "", update_bybigint(20) NULL COMMENT "" ) ENGINE=OLAP PRIMARY KEY(id) COMMENT "OLAP" DISTRIBUTED BY HASH(id);
  4. 集群 B 的建表语句:create database external; use external; CREATE EXTERNAL TABLEexternal_t_1( idbigint(20) NOT NULL COMMENT "", warehouse_idvarchar(10) NULL COMMENT "", warehouse_namevarchar(255) NULL COMMENT "", spu_idvarchar(16) NULL COMMENT "", spu_namevarchar(255) NULL COMMENT "", sku_idbigint(20) NULL COMMENT "", sku_namevarchar(255) NULL COMMENT "", batch_idvarchar(16) NULL COMMENT "", batch_namevarchar(255) NULL COMMENT "", stock_countbigint(20) NULL COMMENT "", pricedecimal(10, 2) NULL COMMENT "", create_timedatetime NULL COMMENT "", create_namevarchar(64) NULL COMMENT "", create_bybigint(20) NULL COMMENT "", update_timedatetime NULL COMMENT "", update_namevarchar(64) NULL COMMENT "", update_bybigint(20) NULL COMMENT "" ) ENGINE=OLAP PRIMARY KEY(id) COMMENT "OLAP" DISTRIBUTED BY HASH(id) PROPERTIES ( "host" = "192.68.1.139", "port" = "9020", "user" = "insert_user", "password" = "insert_password", "database" = "myDB", "table" = "t" ); 5.我现在有一个插入语句: INSERT INTO external.external_t_1 (id, warehouse_id, warehouse_name, spu_id, spu_name, sku_id, sku_name, batch_id, batch_name, stock_count, price, create_time, create_name, create_by, update_time, update_name, update_by) VALUES (1844260193068781575, '3300001', '杭州主仓', '1101', '一年级上学期语文', 11010101, '一年级上学期语文人民出版社简体中文', '1949009', '1949009', 313, 44.00, '2024-10-10 15:45:15', 'admin', 1, '2024-10-10 15:45:15', 'admin', 1);

6.执行后,报错但没有明显的原因。image 7.查询告警日志后,我发现了空指针异常。fe/log/fe.warn.log image

8.我觉得这可能是一个代码层面的问题。我们看看如何解决?会在下个版本中修复吗?

English Version

Background

  1. There are two clusters, A and B, both in a storage-compute separation mode

  2. The machines in cluster A are 192.168.1.67 (FE follower + CN,starrocks-cn), 192.168.1.139 (FE leader + CN,starrocks-fe-2), and 192.168.1.148 (FE Observer,starrocks-fe-4).I enabled FQDN.

  3. image

  4. The machine in cluster B is 192.168.1.149 (FE leader + CN)

  5. image

  6. Both clusters can be used for tasks including but not limited to creating databases, creating regular tables and external tables. In regular tables, data can be inserted, deleted, queried, and updated(CRUD)

Implementation process

  1. I want to try StarRocks' external tables, especially between two StarRocks clusters, to see if what's mentioned in the documentation about achieving inter-cluster synchronization and read-write separation works。
  2. I will create an external table in cluster B, and there will be a regular table in cluster A。Just as mentioned in the documentation, but to be honest, the wording in the documentation is a bit convoluted; it would be better to have specific references.
  3. Table Creation Statement in Cluster A: create database myDB; use myDB; CREATE TABLEt( idbigint(20) NOT NULL COMMENT "", warehouse_idvarchar(10) NULL COMMENT "", warehouse_namevarchar(255) NULL COMMENT "", spu_idvarchar(16) NULL COMMENT "", spu_namevarchar(255) NULL COMMENT "", sku_idbigint(20) NULL COMMENT "", sku_namevarchar(255) NULL COMMENT "", batch_idvarchar(16) NULL COMMENT "", batch_namevarchar(255) NULL COMMENT "", stock_countbigint(20) NULL COMMENT "", pricedecimal(10, 2) NULL COMMENT "", create_timedatetime NULL COMMENT "", create_namevarchar(64) NULL COMMENT "", create_bybigint(20) NULL COMMENT "", update_timedatetime NULL COMMENT "", update_namevarchar(64) NULL COMMENT "", update_bybigint(20) NULL COMMENT "" ) ENGINE=OLAP PRIMARY KEY(id) COMMENT "OLAP" DISTRIBUTED BY HASH(id);
  4. The table creation statement in cluster B create database external; use external; CREATE EXTERNAL TABLEexternal_t_1( idbigint(20) NOT NULL COMMENT "", warehouse_idvarchar(10) NULL COMMENT "", warehouse_namevarchar(255) NULL COMMENT "", spu_idvarchar(16) NULL COMMENT "", spu_namevarchar(255) NULL COMMENT "", sku_idbigint(20) NULL COMMENT "", sku_namevarchar(255) NULL COMMENT "", batch_idvarchar(16) NULL COMMENT "", batch_namevarchar(255) NULL COMMENT "", stock_countbigint(20) NULL COMMENT "", pricedecimal(10, 2) NULL COMMENT "", create_timedatetime NULL COMMENT "", create_namevarchar(64) NULL COMMENT "", create_bybigint(20) NULL COMMENT "", update_timedatetime NULL COMMENT "", update_namevarchar(64) NULL COMMENT "", update_bybigint(20) NULL COMMENT "" ) ENGINE=OLAP PRIMARY KEY(id) COMMENT "OLAP" DISTRIBUTED BY HASH(id) PROPERTIES ( "host" = "192.68.1.139", "port" = "9020", "user" = "insert_user", "password" = "insert_password", "database" = "myDB", "table" = "t" ); 7.I now have an insert statement: INSERT INTO external.external_t_1 (id, warehouse_id, warehouse_name, spu_id, spu_name, sku_id, sku_name, batch_id, batch_name, stock_count, price, create_time, create_name, create_by, update_time, update_name, update_by) VALUES (1844260193068781575, '3300001', '杭州主仓', '1101', '一年级上学期语文', 11010101, '一年级上学期语文人民出版社简体中文', '1949009', '1949009', 313, 44.00, '2024-10-10 15:45:15', 'admin', 1, '2024-10-10 15:45:15', 'admin', 1);

8.After execution, it reports an error with no obvious reason. image

11.After querying the alert logs, I found a null pointer exception image

  1. I feel like it's a code-level issue. Let's see how this can be resolved.
HelloWorldwangjiacheng commented 3 days ago

我找了一下3.3.3的源码,发现应该是哪里有没有判空 image

image

image

但是奇怪的是,如果是cn节点有问题,那为什么我在集群B的普通表的插入时可以做到的,能到表里。而且cn没什么问题,就是在插入外表的时候报错。