datavane / tis

Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
https://tis.pub
Apache License 2.0
989 stars 217 forks source link

Flink K8S Cluster启动附加loadbalance service Flink JobMaster报WARN信息 #310

Closed baisui1981 closed 6 months ago

baisui1981 commented 6 months ago

现象

Flink Cluster 启动附加loadbalance service Flink JobMaster会报如下WARN信息

2024-04-01 08:37:27,035 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Starting the resource manager.
2024-04-01 08:37:27,046 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] - Starting the slot manager.
2024-04-01 08:37:27,048 INFO  org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - Starting tokens update task
2024-04-01 08:37:27,048 WARN  org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - No tokens obtained so skipping notifications
2024-04-01 08:37:27,048 WARN  org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - Tokens update task not started because either no tokens obtained or none of the tokens specified its renewal date
2024-04-01 08:37:28,801 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Recovered 0 pods from previous attempts, current attempt id is 1.
2024-04-01 08:37:28,801 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Recovered 0 workers from previous attempt.
2024-04-01 08:37:42,495 WARN  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint   [] - Unhandled exception
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_402]
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_402]
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_402]
        at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_402]
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) ~[?:1.8.0_402]
        at org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:256) ~[flink-dist-tis-1.18.1.jar:tis-1.18.1]
        at org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132) ~[flink-dist-tis-1.18.1.jar:tis-1.18.1]
        at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357) ~[flink-dist-tis-1.18.1.jar:tis-1.18.1]
        at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151) [flink-dist-tis-1.18.1.jar:tis-1.18.1]

原因分析

网上有大神分析,这个WARN信息是由于loadbalance服务会发一直发送RST包 http://apache-flink.370.s1.nabble.com/flink-1-12-0-k8s-session-td10814.html

解决办法

临时绕过去的方案就是在log4j2配置里面把org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint这个类的log级别调到ERROR

baisui1981 commented 6 months ago

修复