StarRocks / starrocks-connector-for-apache-flink

Apache License 2.0
192 stars 154 forks source link

[Bug] Can not sink when StarRocks deployed in `shared_data` mode #294

Closed Jin-H closed 10 months ago

Jin-H commented 10 months ago

StarRocks version

3.1.2

Connector version

1.2.8-1.13

Reproduce step

this may be a bug in FE

1、deploy StarRocks in shared_data mode 2、write to StarRocks using Flink connector 3、get error message class com.starrocks.common.UserException: No backend alive.

java.lang.RuntimeException: com.starrocks.data.load.stream.exception.StreamLoadFailException: Transaction start failed, db: xxx, table: xxx, label: flink-ea913273-e5fb-4259-9c66-16b680409de2, responseBody: {
  "Status": "FAILED",
  "Message": "class com.starrocks.common.UserException: No backend alive."
}
    at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.AssertNotException(StreamLoadManagerV2.java:427) ~[?:?]
    at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.write(StreamLoadManagerV2.java:252) ~[?:?]
    at com.starrocks.connector.flink.table.sink.StarRocksDynamicSinkFunctionV2.invoke(StarRocksDynamicSinkFunctionV2.java:197) ~[?:?]
    at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:38) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:317) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:411) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at com.kedacom.starrocks.source.DeviceSource.run(DeviceSource.java:35) ~[?:?]
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:104) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:60) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
    at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:269) ~[flink-dist_2.12-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
Caused by: com.starrocks.data.load.stream.exception.StreamLoadFailException: Transaction start failed, db: xxx, table: xxx, label: flink-ea913273-e5fb-4259-9c66-16b680409de2, responseBody: {
  "Status": "FAILED",
  "Message": "class com.starrocks.common.UserException: No backend alive."
}
    at com.starrocks.data.load.stream.TransactionStreamLoader.doBegin(TransactionStreamLoader.java:153) ~[?:?]
    at com.starrocks.data.load.stream.TransactionStreamLoader.begin(TransactionStreamLoader.java:99) ~[?:?]
    at com.starrocks.data.load.stream.DefaultStreamLoader.send(DefaultStreamLoader.java:170) ~[?:?]
    at com.starrocks.data.load.stream.v2.TransactionTableRegion.streamLoad(TransactionTableRegion.java:331) ~[?:?]
    at com.starrocks.data.load.stream.v2.TransactionTableRegion.flush(TransactionTableRegion.java:228) ~[?:?]
    at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.lambda$init$0(StreamLoadManagerV2.java:220) ~[?:?]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
banmoy commented 10 months ago

Do you make sure the backends are alive? What's the output of SHOW BACKENDS?

Jin-H commented 10 months ago

Only deploy FE + CN

StarRocks > SHOW PROC '/backends';
Empty set (0.00 sec)

StarRocks > show proc '/compute_nodes';
+---------------+----------------------------------------------------------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+--------+------------------+----------+-------------------+------------+------------+----------------+-------------+----------+
| ComputeNodeId | IP                                                             | HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime       | LastHeartbeat       | Alive | SystemDecommissioned | ClusterDecommissioned | ErrMsg | Version          | CpuCores | NumRunningQueries | MemUsedPct | CpuUsedPct | HasStoragePath | StarletPort | WorkerId |
+---------------+----------------------------------------------------------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+--------+------------------+----------+-------------------+------------+------------+----------------+-------------+----------+
| 10004         | starrocks-be-s3-0.starrocks-be-s3.olap.svc.cluster.local | 9050          | 9060   | 8040     | 8060     | 2023-10-20 17:10:22 | 2023-10-22 15:44:51 | true  | false                | false                 |        | 3.1.2-4f3a2ee91b | 2        | 0                 | 0.86 %     | 0.0 %      | 9070           | 1           | true     |
| 10030         | starrocks-be-s3-1.starrocks-be-s3.olap.svc.cluster.local | 9050          | 9060   | 8040     | 8060     | 2023-10-20 17:12:12 | 2023-10-22 15:44:51 | true  | false                | false                 |        | 3.1.2-4f3a2ee91b | 2        | 0                 | 0.88 %     | 0.0 %      | 9070           | 2           | true     |
| 10055         | starrocks-be-s3-2.starrocks-be-s3.olap.svc.cluster.local | 9050          | 9060   | 8040     | 8060     | 2023-10-20 17:13:32 | 2023-10-22 15:44:51 | true  | false                | false                 |        | 3.1.2-4f3a2ee91b | 2        | 0                 | 1.37 %     | 0.0 %      | 9070           | 3           | true     |
+---------------+----------------------------------------------------------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+--------+------------------+----------+-------------------+------------+------------+----------------+-------------+----------+
3 rows in set (0.00 sec)
Jin-H commented 10 months ago

https://github.com/StarRocks/starrocks/issues/33368

Jin-H commented 10 months ago

pr has been merged