StarRocks / starrocks-connector-for-apache-flink

Apache License 2.0
192 stars 154 forks source link

[BUG] StarRocksSink 重试多次后死锁 #377

Open songpinru opened 2 months ago

songpinru commented 2 months ago

使用Starrocks遇到一个问题: flink写入SR,某段时间SR出现故障,不能写入,flink sink重试3次依然,预期flink此时应该报错重启或者挂掉,但是发现flink正常运行,不再继续写入SR,也不再继续读取上游数据,陷入僵死状态。

任务信息: 没有开启checkpoint,sink.properties.format=json,其余配置皆为默认配置

日志如下: ps:当时没有保留日志,使用另一个程序的日志代替,同样是retry 3次依然失败,flink任务没有报错

2024-07-02 10:43:16,953 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[d0442073-7e85-48f1-8f47-aa3cab7ad15f].
2024-07-02 10:43:16,953 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_json/_stream_load', size: '78500', thread: 64
2024-07-02 10:43:16,974 WARN  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Failed to flush batch data to StarRocks, retry times = 0
com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response: 
{"Status":"Fail","BeginTxnTimeMs":0,"Message":"Failed to parse json as array. error: Within strings, some characters must be escaped, we found unescaped characters","NumberUnselectedRows":0,"CommitAndPublishTimeMs":0,"Label":"d0442073-7e85-48f1-8f47-aa3cab7ad15f","LoadBytes":78500,"StreamLoadPlanTimeMs":1,"NumberTotalRows":0,"WriteDataTimeMs":9,"TxnId":6546603,"LoadTimeMs":10,"ReadDataTimeMs":0,"NumberLoadedRows":0,"NumberFilteredRows":0}
{}

    at com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor.doStreamLoad(StarRocksStreamLoadVisitor.java:116) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at com.starrocks.connector.flink.manager.StarRocksSinkManager.asyncFlush(StarRocksSinkManager.java:340) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at com.starrocks.connector.flink.manager.StarRocksSinkManager.lambda$startAsyncFlushing$0(StarRocksSinkManager.java:174) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
2024-07-02 10:43:17,976 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[d0442073-7e85-48f1-8f47-aa3cab7ad15f].
2024-07-02 10:43:17,976 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_json/_stream_load', size: '78500', thread: 64
2024-07-02 10:43:17,994 WARN  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Failed to flush batch data to StarRocks, retry times = 1
com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response: 
{"Status":"Fail","BeginTxnTimeMs":0,"Message":"Failed to parse json as array. error: Within strings, some characters must be escaped, we found unescaped characters","NumberUnselectedRows":0,"CommitAndPublishTimeMs":0,"Label":"d0442073-7e85-48f1-8f47-aa3cab7ad15f","LoadBytes":78500,"StreamLoadPlanTimeMs":0,"NumberTotalRows":0,"WriteDataTimeMs":7,"TxnId":6546605,"LoadTimeMs":8,"ReadDataTimeMs":0,"NumberLoadedRows":0,"NumberFilteredRows":0}
{}

    at com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor.doStreamLoad(StarRocksStreamLoadVisitor.java:116) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at com.starrocks.connector.flink.manager.StarRocksSinkManager.asyncFlush(StarRocksSinkManager.java:340) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at com.starrocks.connector.flink.manager.StarRocksSinkManager.lambda$startAsyncFlushing$0(StarRocksSinkManager.java:174) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
2024-07-02 10:43:18,011 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:18,791 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:18,807 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:18,807 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load: db[social] table[ods_social_ks_comment] rows[50] bytes[26159] label[f71b437e-762e-4b90-ba05-b09d695a69fc].
2024-07-02 10:43:18,809 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[f71b437e-762e-4b90-ba05-b09d695a69fc].
2024-07-02 10:43:18,809 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_comment/_stream_load', size: '26210', thread: 63
2024-07-02 10:43:18,935 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load finished: label[f71b437e-762e-4b90-ba05-b09d695a69fc].
2024-07-02 10:43:19,997 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[d0442073-7e85-48f1-8f47-aa3cab7ad15f].
2024-07-02 10:43:19,997 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_json/_stream_load', size: '78500', thread: 64
2024-07-02 10:43:20,011 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:20,016 WARN  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Failed to flush batch data to StarRocks, retry times = 2
com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response: 
{"Status":"Fail","BeginTxnTimeMs":0,"Message":"Failed to parse json as array. error: Within strings, some characters must be escaped, we found unescaped characters","NumberUnselectedRows":0,"CommitAndPublishTimeMs":0,"Label":"d0442073-7e85-48f1-8f47-aa3cab7ad15f","LoadBytes":78500,"StreamLoadPlanTimeMs":0,"NumberTotalRows":0,"WriteDataTimeMs":7,"TxnId":6546609,"LoadTimeMs":8,"ReadDataTimeMs":0,"NumberLoadedRows":0,"NumberFilteredRows":0}
{}

    at com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor.doStreamLoad(StarRocksStreamLoadVisitor.java:116) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at com.starrocks.connector.flink.manager.StarRocksSinkManager.asyncFlush(StarRocksSinkManager.java:340) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at com.starrocks.connector.flink.manager.StarRocksSinkManager.lambda$startAsyncFlushing$0(StarRocksSinkManager.java:174) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
2024-07-02 10:43:20,791 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:20,935 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:20,935 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load: db[social] table[ods_social_ks_comment] rows[132] bytes[68885] label[d0521fc5-89dc-4d96-9b5c-7a85457fb342].
2024-07-02 10:43:20,937 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[d0521fc5-89dc-4d96-9b5c-7a85457fb342].
2024-07-02 10:43:20,941 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_comment/_stream_load', size: '69018', thread: 63
2024-07-02 10:43:21,177 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load finished: label[d0521fc5-89dc-4d96-9b5c-7a85457fb342].
2024-07-02 10:43:22,011 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:22,791 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:23,018 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[d0442073-7e85-48f1-8f47-aa3cab7ad15f].
2024-07-02 10:43:23,018 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_json/_stream_load', size: '78500', thread: 64
2024-07-02 10:43:23,036 WARN  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Failed to flush batch data to StarRocks, retry times = 3
com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response: 
{"Status":"Fail","BeginTxnTimeMs":0,"Message":"Failed to parse json as array. error: Within strings, some characters must be escaped, we found unescaped characters","NumberUnselectedRows":0,"CommitAndPublishTimeMs":0,"Label":"d0442073-7e85-48f1-8f47-aa3cab7ad15f","LoadBytes":78500,"StreamLoadPlanTimeMs":0,"NumberTotalRows":0,"WriteDataTimeMs":7,"TxnId":6546611,"LoadTimeMs":8,"ReadDataTimeMs":0,"NumberLoadedRows":0,"NumberFilteredRows":0}
{}

    at com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor.doStreamLoad(StarRocksStreamLoadVisitor.java:116) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at com.starrocks.connector.flink.manager.StarRocksSinkManager.asyncFlush(StarRocksSinkManager.java:340) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at com.starrocks.connector.flink.manager.StarRocksSinkManager.lambda$startAsyncFlushing$0(StarRocksSinkManager.java:174) ~[blob_p-895cea66edfff125d480c2434207bf2b18c87e89-a185b9cd92a59348133d3a2090b20e01:?]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
2024-07-02 10:43:23,177 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:23,177 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load: db[social] table[ods_social_ks_comment] rows[116] bytes[60488] label[107e7e84-e340-4321-8916-2b05cc471494].
2024-07-02 10:43:23,179 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[107e7e84-e340-4321-8916-2b05cc471494].
2024-07-02 10:43:23,180 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_comment/_stream_load', size: '60605', thread: 63
2024-07-02 10:43:23,318 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load finished: label[107e7e84-e340-4321-8916-2b05cc471494].
2024-07-02 10:43:24,011 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:24,791 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:25,319 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:25,319 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load: db[social] table[ods_social_ks_comment] rows[71] bytes[39524] label[8f9b7f0b-3a47-4ee3-a93a-e17938cd6886].
2024-07-02 10:43:25,321 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[8f9b7f0b-3a47-4ee3-a93a-e17938cd6886].
2024-07-02 10:43:25,321 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_comment/_stream_load', size: '39596', thread: 63
2024-07-02 10:43:25,527 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load finished: label[8f9b7f0b-3a47-4ee3-a93a-e17938cd6886].
2024-07-02 10:43:26,011 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:26,792 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:27,528 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:27,528 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load: db[social] table[ods_social_ks_comment] rows[97] bytes[48808] label[3ce64d14-e276-4e6e-83a5-0c37793489bd].
2024-07-02 10:43:27,530 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Start to join batch data: label[3ce64d14-e276-4e6e-83a5-0c37793489bd].
2024-07-02 10:43:27,530 INFO  com.starrocks.connector.flink.manager.StarRocksStreamLoadVisitor [] - Executing stream load to: 'http://fe-c-ea043025e91a9d66-internal.starrocks.aliyuncs.com:8030/api/social/ods_social_ks_comment/_stream_load', size: '48906', thread: 63
2024-07-02 10:43:27,712 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - Async stream load finished: label[3ce64d14-e276-4e6e-83a5-0c37793489bd].
2024-07-02 10:43:28,012 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:28,792 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.
2024-07-02 10:43:29,712 INFO  com.starrocks.connector.flink.manager.StarRocksSinkManager   [] - StarRocks interval Sinking triggered.