Closed zhouhoo closed 1 year ago
1.2.1
When use routine load to consume kafka data, the bellow error often occur:
2023-01-07 13:36:14,545 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560 2023-01-07 13:36:45,526 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560 2023-01-07 13:37:15,636 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560 2023-01-07 13:37:45,895 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560 2023-01-07 13:38:15,982 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560 2023-01-07 13:39:26,082 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560
no error
create a routine load job and run , the create sql is as fellow: -- 创建作业 CREATE ROUTINE LOAD kafka_gxy_student on gxy_students -- load_properties 导入描述 COLUMNS TERMINATED BY ",", -- 指定分隔符 COLUMNS( student_id,school_id,create_time,is_deleted,snow_flake_id,modified_time,user_id,school_name,academe_id,academe_name,major_id,major_name,major_field,classes_id,classes_name,username,student_number,gender,grade,bind_state,bind_time,mobile,family_name,family_mobile,family_address,family_province,family_city,family_area,birthday,nation,origin_province,origin_city,origin_area,card_no,age,is_employment,email,level,educational,is_practice,about_type,school_add_time,graduation_time,nationality,face,overseas,weixin,qq,bank_name,bank_account,practice_state,job_state,head_img,auth_code,backup,create_by,modified_by ) PROPERTIES -- 指定例行导入作业的通用参数 ( "desired_concurrent_number"="6", -- 作业并发度 "strict_mode"="false", -- 严格模式 "format" = "json", -- 类型 json或csv -- 以下参数控制单个任务的执行时间,其中任意一个阈值达到,则任务结束 -- 假设一行数据 500B,希望每 100MB 或 10 秒为一个 task。100MB 的预期处理时间是 10-20 秒,对应的行数约为 200000 行 "max_batch_interval" = "10", -- 每个子任务最大执行时间 "max_batch_rows" = "200000", -- max_batch_rows 用于记录从 Kafka 中读取到的数据行数 "max_batch_size" = "104857600", -- max_batch_size 用于记录从 Kafka 中读取到的数据量,单位是字节。 "max_error_number"="1000", -- 允许错误行数 "json_root" = "$.data", -- 获取data下的数据 "strip_outer_array" = "true" -- 因为 $.data下的数据是一个 JSONArry 数组 [{}],所以进行数据展平 ) FROM KAFKA ( -- Kafka信息 "kafka_broker_list"= "10.xx.xx.xx:9092", -- 节点 "kafka_topic" = "gxy_student", -- topic名称 "kafka_partitions" = "0", "kafka_offsets" = "OFFSET_END" );
-- 创建作业 CREATE ROUTINE LOAD kafka_gxy_student on gxy_students -- load_properties 导入描述 COLUMNS TERMINATED BY ",", -- 指定分隔符 COLUMNS( student_id,school_id,create_time,is_deleted,snow_flake_id,modified_time,user_id,school_name,academe_id,academe_name,major_id,major_name,major_field,classes_id,classes_name,username,student_number,gender,grade,bind_state,bind_time,mobile,family_name,family_mobile,family_address,family_province,family_city,family_area,birthday,nation,origin_province,origin_city,origin_area,card_no,age,is_employment,email,level,educational,is_practice,about_type,school_add_time,graduation_time,nationality,face,overseas,weixin,qq,bank_name,bank_account,practice_state,job_state,head_img,auth_code,backup,create_by,modified_by ) PROPERTIES -- 指定例行导入作业的通用参数 ( "desired_concurrent_number"="6", -- 作业并发度 "strict_mode"="false", -- 严格模式 "format" = "json", -- 类型 json或csv -- 以下参数控制单个任务的执行时间,其中任意一个阈值达到,则任务结束 -- 假设一行数据 500B,希望每 100MB 或 10 秒为一个 task。100MB 的预期处理时间是 10-20 秒,对应的行数约为 200000 行 "max_batch_interval" = "10", -- 每个子任务最大执行时间 "max_batch_rows" = "200000", -- max_batch_rows 用于记录从 Kafka 中读取到的数据行数 "max_batch_size" = "104857600", -- max_batch_size 用于记录从 Kafka 中读取到的数据量,单位是字节。 "max_error_number"="1000", -- 允许错误行数 "json_root" = "$.data", -- 获取data下的数据 "strip_outer_array" = "true" -- 因为 $.data下的数据是一个 JSONArry 数组 [{}],所以进行数据展平 ) FROM KAFKA ( -- Kafka信息 "kafka_broker_list"= "10.xx.xx.xx:9092", -- 节点 "kafka_topic" = "gxy_student", -- topic名称 "kafka_partitions" = "0", "kafka_offsets" = "OFFSET_END" );
the data from kafka is imported but the error is still there.
none
我也遇到同样的问题了 之前以为是在binlog同步时候出现, 现在发现这个错误会一直报
Search before asking
Version
1.2.1
What's Wrong?
When use routine load to consume kafka data, the bellow error often occur:
2023-01-07 13:36:14,545 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560 2023-01-07 13:36:45,526 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560 2023-01-07 13:37:15,636 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560 2023-01-07 13:37:45,895 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560 2023-01-07 13:38:15,982 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560 2023-01-07 13:39:26,082 ERROR (Routine load task scheduler|36) [OlapTableSink.createLocation():397] register txn replica failed, txnId=-1, dbId=620560
What You Expected?
no error
How to Reproduce?
create a routine load job and run , the create sql is as fellow:
-- 创建作业 CREATE ROUTINE LOAD kafka_gxy_student on gxy_students -- load_properties 导入描述 COLUMNS TERMINATED BY ",", -- 指定分隔符 COLUMNS( student_id,school_id,create_time,is_deleted,snow_flake_id,modified_time,user_id,school_name,academe_id,academe_name,major_id,major_name,major_field,classes_id,classes_name,username,student_number,gender,grade,bind_state,bind_time,mobile,family_name,family_mobile,family_address,family_province,family_city,family_area,birthday,nation,origin_province,origin_city,origin_area,card_no,age,is_employment,email,level,educational,is_practice,about_type,school_add_time,graduation_time,nationality,face,overseas,weixin,qq,bank_name,bank_account,practice_state,job_state,head_img,auth_code,backup,create_by,modified_by ) PROPERTIES -- 指定例行导入作业的通用参数 ( "desired_concurrent_number"="6", -- 作业并发度 "strict_mode"="false", -- 严格模式 "format" = "json", -- 类型 json或csv -- 以下参数控制单个任务的执行时间,其中任意一个阈值达到,则任务结束 -- 假设一行数据 500B,希望每 100MB 或 10 秒为一个 task。100MB 的预期处理时间是 10-20 秒,对应的行数约为 200000 行 "max_batch_interval" = "10", -- 每个子任务最大执行时间 "max_batch_rows" = "200000", -- max_batch_rows 用于记录从 Kafka 中读取到的数据行数 "max_batch_size" = "104857600", -- max_batch_size 用于记录从 Kafka 中读取到的数据量,单位是字节。 "max_error_number"="1000", -- 允许错误行数 "json_root" = "$.data", -- 获取data下的数据 "strip_outer_array" = "true" -- 因为 $.data下的数据是一个 JSONArry 数组 [{}],所以进行数据展平 ) FROM KAFKA ( -- Kafka信息 "kafka_broker_list"= "10.xx.xx.xx:9092", -- 节点 "kafka_topic" = "gxy_student", -- topic名称 "kafka_partitions" = "0", "kafka_offsets" = "OFFSET_END" );
the data from kafka is imported but the error is still there.
Anything Else?
none
Are you willing to submit PR?
Code of Conduct