apache / linkis

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
https://linkis.apache.org/
Apache License 2.0
3.3k stars 1.17k forks source link

enginesWaitForHeartbeat中已经停止的Engine没有清理掉(EnginesWaitForHeartbeat stopped Engine not cleaned up) #352

Closed wForget closed 2 years ago

wForget commented 4 years ago

Describe the bug enginesWaitForHeartbeat 中存在已经停止的 Engine。EngineManagerImpl 中会不断扫描这个 Engine 的状态。

To Reproduce 未知

Screenshots 2020-04-26 10:32:20.586 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:21.086 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:22.189 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:23.793 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:25.896 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:27.999 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:30.102 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:32.206 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:34.309 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:36.412 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:38.516 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:40.619 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:42.722 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:44.825 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:46.928 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:49.031 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:51.135 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:53.238 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:55.341 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:57.444 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:32:59.547 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:01.650 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:03.753 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:05.856 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:07.959 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:10.062 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:12.165 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:14.268 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:16.372 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:18.475 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:20.578 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:22.681 [INFO ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.r.s.e.EurekaRPCServerLoader (42) [info] - Need a ServiceInstance(sparkEngine, test-host:36550), but cannot find in DiscoveryClient refresh list. 2020-04-26 10:33:22.682 [WARN ] [BDP-Default-Scheduler-Thread-7 ] c.w.w.l.e.e.i.EngineManagerImpl (84) [$anonfun$tryAndWarn$1] - java.lang.reflect.UndeclaredThrowableException: null at com.sun.proxy.$Proxy201.receiveAndReply(Unknown Source) ~[?:?] at com.webank.wedatasphere.linkis.rpc.BaseRPCSender.$anonfun$ask$1(BaseRPCSender.scala:86) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.BaseRPCInterceptorExchange.invoke(RPCInterceptorExchange.scala:32) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.common.CommonRPCInterceptor.intercept(CommonRPCInterceptor.scala:29) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.BaseRPCInterceptorChain.handle(RPCInterceptorChain.scala:35) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.common.RetryableRPCInterceptor.$anonfun$intercept$1(RetryableRPCInterceptor.scala:53) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:48) ~[linkis-common-0.9.3.jar:?] at com.webank.wedatasphere.linkis.common.utils.RetryHandler.retry(RetryHandler.scala:56) ~[linkis-common-0.9.3.jar:?] at com.webank.wedatasphere.linkis.common.utils.RetryHandler.retry$(RetryHandler.scala:52) ~[linkis-common-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.common.RetryableRPCInterceptor$RPCRetryHandler.retry(RetryableRPCInterceptor.scala:58) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.common.RetryableRPCInterceptor.intercept(RetryableRPCInterceptor.scala:53) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.BaseRPCInterceptorChain.handle(RPCInterceptorChain.scala:35) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.common.CacheableRPCInterceptor.intercept(CacheableRPCInterceptor.scala:62) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.BaseRPCInterceptorChain.handle(RPCInterceptorChain.scala:35) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.common.BroadcastRPCInterceptor.intercept(BroadcastRPCInterceptor.scala:74) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.BaseRPCInterceptorChain.handle(RPCInterceptorChain.scala:35) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.BaseRPCSender.execute(BaseRPCSender.scala:79) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.BaseRPCSender.ask(BaseRPCSender.scala:83) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.entrance.execute.EntranceEngine.refreshState(EntranceEngine.scala:136) ~[linkis-ujes-entrance-0.9.3.jar:?] at com.webank.wedatasphere.linkis.entrance.execute.impl.EngineManagerImpl$$anon$1.$anonfun$run$3(EngineManagerImpl.scala:53) ~[linkis-ujes-entrance-0.9.3.jar:?] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.8.jar:?] at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:48) ~[linkis-common-0.9.3.jar:?] at com.webank.wedatasphere.linkis.common.utils.Utils$.tryAndWarn(Utils.scala:74) ~[linkis-common-0.9.3.jar:?] at com.webank.wedatasphere.linkis.entrance.execute.impl.EngineManagerImpl$$anon$1.$anonfun$run$2(EngineManagerImpl.scala:52) ~[linkis-ujes-entrance-0.9.3.jar:?] at com.webank.wedatasphere.linkis.entrance.execute.impl.EngineManagerImpl$$anon$1.$anonfun$run$2$adapted(EngineManagerImpl.scala:50) ~[linkis-ujes-entrance-0.9.3.jar:?] at scala.collection.immutable.List.foreach(List.scala:392) ~[scala-library-2.12.8.jar:?] at com.webank.wedatasphere.linkis.entrance.execute.impl.EngineManagerImpl$$anon$1.$anonfun$run$1(EngineManagerImpl.scala:50) ~[linkis-ujes-entrance-0.9.3.jar:?] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [scala-library-2.12.8.jar:?] at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:48) [linkis-common-0.9.3.jar:?] at com.webank.wedatasphere.linkis.common.utils.Utils$.tryAndWarn(Utils.scala:74) [linkis-common-0.9.3.jar:?] at com.webank.wedatasphere.linkis.entrance.execute.impl.EngineManagerImpl$$anon$1.run(EngineManagerImpl.scala:50) [linkis-ujes-entrance-0.9.3.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_181] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181] Caused by: com.webank.wedatasphere.linkis.rpc.exception.NoInstanceExistsException: errCode: 10051 ,desc: The instance test-host:36550 of application sparkEngine is not exists. ,ip: test-host ,port: 9106 ,serviceKind: sparkEntrance at com.webank.wedatasphere.linkis.rpc.interceptor.AbstractRPCServerLoader.getOrRefresh(RPCServerLoader.scala:60) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.AbstractRPCServerLoader.getServer(RPCServerLoader.scala:77) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.sender.SpringMVCRPCSender$$anon$1$$anon$2.$anonfun$customizeLoadBalancerCommandBuilder$7(SpringMVCRPCSender.scala:80) ~[linkis-cloudRPC-0.9.3.jar:?] at scala.Option.foreach(Option.scala:274) ~[scala-library-2.12.8.jar:?] at com.webank.wedatasphere.linkis.rpc.sender.SpringMVCRPCSender$$anon$1$$anon$2.customizeLoadBalancerCommandBuilder(SpringMVCRPCSender.scala:79) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.sender.SpringMVCRPCSender$$anon$1$$anon$2.customizeLoadBalancerCommandBuilder(SpringMVCRPCSender.scala:58) ~[linkis-cloudRPC-0.9.3.jar:?] at com.netflix.client.AbstractLoadBalancerAwareClient.buildLoadBalancerCommand(AbstractLoadBalancerAwareClient.java:132) ~[ribbon-loadbalancer-2.2.5.jar:2.2.5] at com.netflix.client.AbstractLoadBalancerAwareClient.executeWithLoadBalancer(AbstractLoadBalancerAwareClient.java:94) ~[ribbon-loadbalancer-2.2.5.jar:2.2.5] at org.springframework.cloud.openfeign.ribbon.LoadBalancerFeignClient.execute(LoadBalancerFeignClient.java:63) ~[spring-cloud-openfeign-core-2.0.0.RELEASE.jar:2.0.0.RELEASE] at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:97) ~[feign-core-9.5.1.jar:?] at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:76) ~[feign-core-9.5.1.jar:?] at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:103) ~[feign-core-9.5.1.jar:?] ... 38 more Caused by: java.util.concurrent.TimeoutException at com.webank.wedatasphere.linkis.common.utils.Utils$.aux$1(Utils.scala:199) ~[linkis-common-0.9.3.jar:?] at com.webank.wedatasphere.linkis.common.utils.Utils$.waitUntil(Utils.scala:204) ~[linkis-common-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.AbstractRPCServerLoader.$anonfun$getOrRefresh$1(RPCServerLoader.scala:67) ~[linkis-cloudRPC-0.9.3.jar:?] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.8.jar:?] at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:48) ~[linkis-common-0.9.3.jar:?] at com.webank.wedatasphere.linkis.common.utils.Utils$.tryThrow(Utils.scala:59) ~[linkis-common-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.AbstractRPCServerLoader.getOrRefresh(RPCServerLoader.scala:67) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.interceptor.AbstractRPCServerLoader.getServer(RPCServerLoader.scala:77) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.sender.SpringMVCRPCSender$$anon$1$$anon$2.$anonfun$customizeLoadBalancerCommandBuilder$7(SpringMVCRPCSender.scala:80) ~[linkis-cloudRPC-0.9.3.jar:?] at scala.Option.foreach(Option.scala:274) ~[scala-library-2.12.8.jar:?] at com.webank.wedatasphere.linkis.rpc.sender.SpringMVCRPCSender$$anon$1$$anon$2.customizeLoadBalancerCommandBuilder(SpringMVCRPCSender.scala:79) ~[linkis-cloudRPC-0.9.3.jar:?] at com.webank.wedatasphere.linkis.rpc.sender.SpringMVCRPCSender$$anon$1$$anon$2.customizeLoadBalancerCommandBuilder(SpringMVCRPCSender.scala:58) ~[linkis-cloudRPC-0.9.3.jar:?] at com.netflix.client.AbstractLoadBalancerAwareClient.buildLoadBalancerCommand(AbstractLoadBalancerAwareClient.java:132) ~[ribbon-loadbalancer-2.2.5.jar:2.2.5] at com.netflix.client.AbstractLoadBalancerAwareClient.executeWithLoadBalancer(AbstractLoadBalancerAwareClient.java:94) ~[ribbon-loadbalancer-2.2.5.jar:2.2.5] at org.springframework.cloud.openfeign.ribbon.LoadBalancerFeignClient.execute(LoadBalancerFeignClient.java:63) ~[spring-cloud-openfeign-core-2.0.0.RELEASE.jar:2.0.0.RELEASE] at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:97) ~[feign-core-9.5.1.jar:?] at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:76) ~[feign-core-9.5.1.jar:?] at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:103) ~[feign-core-9.5.1.jar:?] ... 38 more

Additional context no

peacewong commented 2 years ago

Linkis 1.0 has fixed this issue