Spark insert into SR occur erros with columns check

xuzifu666 commented 5 months ago

Steps to reproduce the behavior (Required)

create table : CREATE TABLE bi_realtime_sr.da_brn_down_detail_test_01 ( insert_hash_key varchar(30) NOT NULL, id bigint(20) NOT NULL AUTO_INCREMENT , day varchar(24) NULL , hour varchar(24) NULL , news_src varchar(40) NULL , doc_id varchar(50) NULL , imei_cnt bigint(20) NULL , cnt bigint(20) NULL , type_info varchar(40) NULL , title varchar(1000) NULL , url varchar(5000) NULL , test_addfield varchar(32) NULL , operation_ts bigint(20) NULL COMMENT "比较键", operation_type varchar(65533) NULL COMMENT "键" ) ENGINE=OLAP PRIMARY KEY(insert_hash_key) DISTRIBUTED BY HASH(insert_hash_key) PROPERTIES ( "replication_num" = "3", "in_memory" = "false", "enable_persistent_index" = "true", "replicated_storage" = "true", "compression" = "LZ4" );
create spark table CREATE TABLE da_brn_down_detail_test_sr_02 USING starrocks OPTIONS( "starrocks.table.identifier" = "bi_realtime_sr.da_brn_down_detail_test_01", "starrocks.fe.http.url" = "xxx:8030", "starrocks.fe.jdbc.url" = "jdbc:mysql://xxx:9030", "starrocks.user" = "root111", "starrocks.password" = "xxx" );
insert into da_brn_down_detail_test_sr_02 values('1', 1, '22', '1', '1', '1', 23, 22, 'info', '1', '1', '1', 22334455, '1');

then occur errors as:

Caused by: java.lang.RuntimeException: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: bi_realtime_sr, table: da_brn_down_detail_test_01, label: spark-3f2979cb-d7b1-46a7-9ee6-26c653c969fd, responseBody: { "TxnId": 129318, "Label": "spark-3f2979cb-d7b1-46a7-9ee6-26c653c969fd", "Status": "Fail", "Message": "too many filtered rows", "NumberTotalRows": 1, "NumberLoadedRows": 0, "NumberFilteredRows": 1, "NumberUnselectedRows": 0, "LoadBytes": 41, "LoadTimeMs": 18, "BeginTxnTimeMs": 0, "StreamLoadPlanTimeMs": 1, "ReadDataTimeMs": 0, "WriteDataTimeMs": 17, "CommitAndPublishTimeMs": 0, "ErrorURL": "xxx" } errorLog: Error: Value count does not match column count: expected = 13, actual = 14. Column separator: '\t', Row delimiter: '\n'. Row: 1 1 22 1 1 1 23 22 info 1 1 1 223344551

at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.AssertNotException(StreamLoadManagerV2.java:427)
at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.flush(StreamLoadManagerV2.java:355)
at com.starrocks.connector.spark.sql.write.StarRocksDataWriter.commit(StarRocksDataWriter.java:90)
... 12 more

Caused by: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of error, db: bi_realtime_sr, table: da_brn_down_detail_test_01, label: spark-3f2979cb-d7b1-46a7-9ee6-26c653c969fd, responseBody: { "TxnId": 129318, "Label": "spark-3f2979cb-d7b1-46a7-9ee6-26c653c969fd", "Status": "Fail", "Message": "too many filtered rows", "NumberTotalRows": 1, "NumberLoadedRows": 0, "NumberFilteredRows": 1, "NumberUnselectedRows": 0, "LoadBytes": 41, "LoadTimeMs": 18, "BeginTxnTimeMs": 0, "StreamLoadPlanTimeMs": 1, "ReadDataTimeMs": 0, "WriteDataTimeMs": 17, "CommitAndPublishTimeMs": 0, "ErrorURL": "xxx" } errorLog: Error: Value count does not match column count: expected = 13, actual = 14. Column separator: '\t', Row delimiter: '\n'. Row: 1 1 22 1 1 1 23 22 info 1 1 1 223344551

at com.starrocks.data.load.stream.DefaultStreamLoader.sendToSR(DefaultStreamLoader.java:339)
at com.starrocks.data.load.stream.DefaultStreamLoader.lambda$send$3(DefaultStreamLoader.java:170)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 more

Expected behavior (Required)

insert success

Real behavior (Required)

error with columns number not right

StarRocks version (Required)

branch-3.2-fb01846

jaogoy commented 5 months ago

Well, the errorLog is really not clear, users can't see clearly where is the start error position. Developers should pay some effort to improve it.

banmoy commented 5 months ago

Currently when loading data to a AUTO_INCREMENT column id, must set starrocks.write.properties.columns to include all columns. We will discuss how to make it simple.

yangjiahao0036 commented 5 months ago

You can try to write all the columns you want to insert into the "starrocks.columns" parameter. You can easily get all the columns of the df by using spark, as in the following example.

StarRocks / starrocks