StarRocks / starrocks-connector-for-apache-flink

Apache License 2.0
194 stars 162 forks source link

[Bug] Write to StarRocks failed occasionally. #383

Open gohalo opened 2 months ago

gohalo commented 2 months ago

Got the following error message from flink task manager.

2024-09-05 06:33:49,271 | ERROR | [StarRocks-Sink-Manager] | Transaction prepare failed, db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri, label: flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3, [N]responseBody: {[N]    "Status": "TXN_IN_PROCESSING",[N]    "Label": "flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3",[N]    "Message": "Transaction in processing, please retry later"[N]}[N]errorLog: null | com.starrocks.data.load.stream.TransactionStreamLoader.prepare(TransactionStreamLoader.java:220)
2024-09-05 06:33:49,272 | ERROR | [StarRocks-Sink-Manager] | TransactionTableRegion commit failed, db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri, label: flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3 | com.starrocks.data.load.stream.v2.TransactionTableRegion.commit(TransactionTableRegion.java:257)
com.starrocks.data.load.stream.exception.StreamLoadFailException: Transaction prepare failed, db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri, label: flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3, [N]responseBody: {[N]    "Status": "TXN_IN_PROCESSING",[N]    "Label": "flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3",[N]    "Message": "Transaction in processing, please retry later"[N]}[N]errorLog: null
    at com.starrocks.data.load.stream.TransactionStreamLoader.prepare(TransactionStreamLoader.java:221) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
    at com.starrocks.data.load.stream.v2.TransactionTableRegion.commit(TransactionTableRegion.java:247) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
    at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.lambda$init$0(StreamLoadManagerV2.java:191) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_402]
2024-09-05 06:33:49,272 | ERROR | [StarRocks-Sink-Manager] | Failed to flush data for db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri after 0 times retry, the last exception is | com.starrocks.data.load.stream.v2.TransactionTableRegion.fail(TransactionTableRegion.java:285)
com.starrocks.data.load.stream.exception.StreamLoadFailException: Transaction prepare failed, db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri, label: flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3, [N]responseBody: {[N]    "Status": "TXN_IN_PROCESSING",[N]    "Label": "flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3",[N]    "Message": "Transaction in processing, please retry later"[N]}[N]errorLog: null
    at com.starrocks.data.load.stream.TransactionStreamLoader.prepare(TransactionStreamLoader.java:221) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
    at com.starrocks.data.load.stream.v2.TransactionTableRegion.commit(TransactionTableRegion.java:247) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
    at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.lambda$init$0(StreamLoadManagerV2.java:191) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_402]
2024-09-05 06:33:49,273 | ERROR | [StarRocks-Sink-Manager] | Stream load failed | com.starrocks.data.load.stream.v2.StreamLoadManagerV2.callback(StreamLoadManagerV2.java:340)
com.starrocks.data.load.stream.exception.StreamLoadFailException: Transaction prepare failed, db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri, label: flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3, [N]responseBody: {[N]    "Status": "TXN_IN_PROCESSING",[N]    "Label": "flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3",[N]    "Message": "Transaction in processing, please retry later"[N]}[N]errorLog: null
    at com.starrocks.data.load.stream.TransactionStreamLoader.prepare(TransactionStreamLoader.java:221) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
    at com.starrocks.data.load.stream.v2.TransactionTableRegion.commit(TransactionTableRegion.java:247) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
    at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.lambda$init$0(StreamLoadManagerV2.java:191) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_402]

And the CN.

10905 06:34:56.900581 1073155 transaction_mgr.cpp:190] new transaction manage request, id=6c4ee28b72465107-027odf3576108193, job_id -1, tx_d: 67681501, lobel flink-49eS1069-70b2-41ee-8c22-d40d07a6e3d3, db=ods, tbl ods_fin_cust_account_t_keep_acct_detail_ri op=begin
10905 06:34:56.955199 1073155 transaction_strean_load.cpp:236] new transaction load request.id=6c4ee28b72465107-027adf3576108193, job_id -1, txn_id: 67681501, lobel flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3, db=ods, tbl ods_fin_cust_account_t_keep_acct_detail_ri
10905 06:34:56.973167 1073155 stream_load_executor.cpp:67] begin to execute job. labelaflink-49eS1069-70b2-41ee-8c22-d40d07a6e3d3,txn_id: 67681501, query_id 6c4ee28b-7246-5107-027a-df3576108193
10905 06:34:56.973213 1073155 plon_fragment executor.cpp:83] Prepare(): query_id-6c4ee286-7246-S107-027a-df3576108193 frogment_instance_id-6<4ee28b-7246-5107-027a-df3576108194 backend_num-0
10905 06:34:56.976012 1072502 plon_fragment_executor.cpp:185] Open(): fragment_instance_id 6c4ee28b-7246-5107-027a-df3576108194
10905 06:34:56.981921 1073152 transaction_mgr.cpp:241] new transaction manage request, id-6C4£e28b7246S107-027odf3576108193, job_id -1, txn_id: 67681501, label flink-49eS1063-70b2-41ee-8c22-d40d07a6e3d3, db ods, tbl ods_fin_cust_account_t_keep_acct_detail_ri op=prepare
I0905 06:34:56.988793 1073155 transaction_mgr.cpp:213]new transaction manage request, id 6c4£e28b72465107-027odf35761€8193, job_id -1, txn_d: 67681501, label flink-49e51063-70b2-41ee-8c22-d40d07a6e3d3,   db-ods, tbl ods_fin_cust_account_t_keep_acct_detail_ri op=rollback
10905 06:34:56.988806 1073155 transaction_mgr.cpp:368] Rollback transaction id-6£4ee28b72465107-027odf3576108193, job_id--1, txn_id: 67681501, label flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3
10905 06:34:56.989843 1073098 Lake_service.cpp:334] Aborting transactions-[67681501] tablets-□
10905 06:34:56.989853 1073098 load_Channel_mgr.cpp:280] Aborting load channel because transaction was aborted, load_id-6c4ee28b724651a7-027adf3576108193 txn_id-67681501

It's because the following code, which expect the flink connector to retry later, but actually only one attempt.

image