FederatedAI / KubeFATE

Manage federated learning workload using cloud native technologies.
Apache License 2.0
420 stars 222 forks source link

the server could not find the requested resource #501

Closed liuyanntes closed 2 years ago

liuyanntes commented 2 years ago

root@crms-10-10-178-147[/k8s/fate]# kubefate cluster ls UUID NAME NAMESPACE REVISION STATUS CHART ChartVERSION AGE 4f14f1ca-687c-49cf-a1b7-f9a48e7b37bd fate-9999 fate-9999 1 Running fate v1.7.0 19m

root@crms-10-10-178-147[/k8s/fate]# kubefate cluster describe 4f14f1ca-687c-49cf-a1b7-f9a48e7b37bd the server could not find the requested resource root@crms-10-10-178-147[/k8s/fate]# the server could not find the requested resource

为什么没有描述信息

liuyanntes commented 2 years ago

cluster日志:

root@crms-10-10-178-147[/k8s/fate]# kubefate cluster logs 4f14f1ca-687c-49cf-a1b7-f9a48e7b37bd [python-84d4dbd6df-2nbcj fateboard] [python-84d4dbd6df-2nbcj fateboard] . _ _ [python-84d4dbd6df-2nbcj fateboard] /\ / '_ () \ \ \ \ [python-84d4dbd6df-2nbcj fateboard] ( ( )\ | ' | '| | ' \/ ` | \ \ \ \ [python-84d4dbd6df-2nbcj fateboard] \/ _)| |)| | | | | || (| | ) ) ) ) [python-84d4dbd6df-2nbcj fateboard] ' |__| ._|| ||| |_, | / / / / [python-84d4dbd6df-2nbcj fateboard] =========|_|==============|__/=//// [python-84d4dbd6df-2nbcj fateboard] :: Spring Boot :: (v2.2.0.RELEASE) [python-84d4dbd6df-2nbcj fateboard] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:13 INFO [main] (StartupInfoLogger.java:55) - Starting Bootstrap on python-84d4dbd6df-2nbcj with PID 1 (/data/projects/fate/fateboard/fateboard-1.7.0.jar started by root in /data/projects/fate/fateboard) [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:13 INFO [main] (SpringApplication.java:651) - No active profile set, falling back to default profiles: default [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:16 WARN [main] (ClassPathMapperScanner.java:239) - Skipping MapperFactoryBean with name 'jobMapper' and 'com.webank.ai.fate.board.dao.JobMapper' mapperInterface. Bean already defined with the same name! [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:16 WARN [main] (ClassPathMapperScanner.java:239) - Skipping MapperFactoryBean with name 'taskMapper' and 'com.webank.ai.fate.board.dao.TaskMapper' mapperInterface. Bean already defined with the same name! [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:16 WARN [main] (ClassPathMapperScanner.java:166) - No MyBatis mapper was found in '[com/webank/ai/fate/board/dao]' package. Please check your configuration. [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:17 INFO [main] (PostProcessorRegistrationDelegate.java:330) - Bean 'org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration' of type [org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying) [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (TomcatWebServer.java:92) - Tomcat initialized with port(s): 8080 (http) [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (DirectJDKLog.java:173) - Initializing ProtocolHandler ["http-nio-8080"] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (DirectJDKLog.java:173) - Starting service [Tomcat] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (DirectJDKLog.java:173) - Starting Servlet engine: [Apache Tomcat/9.0.27] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (DirectJDKLog.java:173) - Initializing Spring embedded WebApplicationContext [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (ServletWebServerApplicationContext.java:284) - Root WebApplicationContext: initialization completed in 4221 ms [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (HikariDataSource.java:110) - HikariPool-1 - Starting... [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:19 INFO [main] (HikariDataSource.java:123) - HikariPool-1 - Start completed. [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:20 ERROR [main] (SshService.java:143) - load ssh config file error [python-84d4dbd6df-2nbcj fateboard] java.lang.IllegalArgumentException: null [python-84d4dbd6df-2nbcj fateboard] at com.google.common.base.Preconditions.checkArgument(Preconditions.java:128) [python-84d4dbd6df-2nbcj fateboard] at com.webank.ai.fate.board.ssh.SshService.afterPropertiesSet(SshService.java:138) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1862) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1799) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:595) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1287) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1207) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:636) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:116) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:397) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1429) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:594) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1287) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1207) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:636) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:116) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:397) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1429) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:594) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:879) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:878) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:550) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:141) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:747) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:397) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.SpringApplication.run(SpringApplication.java:315) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.SpringApplication.run(SpringApplication.java:1215) [python-84d4dbd6df-2nbcj fateboard] at com.webank.ai.fate.board.bootstrap.Bootstrap.main(Bootstrap.java:49) [python-84d4dbd6df-2nbcj fateboard] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [python-84d4dbd6df-2nbcj fateboard] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [python-84d4dbd6df-2nbcj fateboard] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [python-84d4dbd6df-2nbcj fateboard] at java.lang.reflect.Method.invoke(Method.java:498) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51) [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:20 INFO [main] (ExecutorConfigurationSupport.java:171) - Initializing ExecutorService [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:20 INFO [main] (ExecutorConfigurationSupport.java:171) - Initializing ExecutorService 'asyncServiceExecutor' [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:21 INFO [main] (SshConfigFileWatcher.java:165) - use system path /data/projects/fate/fateboard/conf [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:21 INFO [main] (SshConfigFileWatcher.java:171) - Scanning /data/projects/fate/fateboard/conf ... [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:21 INFO [main] (Version.java:21) - HV000001: Hibernate Validator 6.0.17.Final [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:22 INFO [main] (WelcomePageHandlerMapping.java:54) - Adding welcome page: class path resource [static/index.html] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:23 INFO [main] (EndpointLinksResolver.java:58) - Exposing 0 endpoint(s) beneath base path '/actuator' [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:23 INFO [main] (ScheduledAnnotationBeanPostProcessor.java:297) - No TaskScheduler/ScheduledExecutorService bean found for scheduled processing [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:23 INFO [main] (DirectJDKLog.java:173) - Starting ProtocolHandler ["http-nio-8080"] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:23 INFO [main] (TomcatWebServer.java:204) - Tomcat started on port(s): 8080 (http) with context path '' [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:23 INFO [main] (StartupInfoLogger.java:61) - Started Bootstrap in 10.617 seconds (JVM running for 12.234) [rollsite-cfd4d6c88-bg9jm rollsite] + mkdir -p /data/projects/fate/eggroll/logs/eggroll/ [rollsite-cfd4d6c88-bg9jm rollsite] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [rollsite-cfd4d6c88-bg9jm rollsite] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [rollsite-cfd4d6c88-bg9jm rollsite] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [rollsite-cfd4d6c88-bg9jm rollsite] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [rollsite-cfd4d6c88-bg9jm rollsite] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [rollsite-cfd4d6c88-bg9jm rollsite] + ln -sf /dev/stderr /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [rollsite-cfd4d6c88-bg9jm rollsite] + java -Dlog4j.configurationFile=/data/projects/fate/eggroll//conf/log4j2.properties -cp '/data/projects/fate/eggroll//lib/:/data/projects/fate/eggroll//conf/' com.webank.eggroll.rollsite.EggSiteBootstrap -c /data/projects/fate/eggroll//conf/eggroll.properties [rollsite-cfd4d6c88-bg9jm rollsite] current dir: /data/projects/fate/eggroll/. [rollsite-cfd4d6c88-bg9jm rollsite] [INFO ][1876][2021-12-09 06:04:03,221][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - conf file: /data/projects/fate/eggroll/conf/eggroll.properties [rollsite-cfd4d6c88-bg9jm rollsite] [INFO ][1904][2021-12-09 06:04:03,249][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - initing router at path=conf/route_table/route_table.json [rollsite-cfd4d6c88-bg9jm rollsite] [INFO ][1958][2021-12-09 06:04:03,303][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - start refreshing route table per min [rollsite-cfd4d6c88-bg9jm rollsite] [INFO ][2612][2021-12-09 06:04:03,957][main,pid:1,tid:1][c.w.e.c.t.GrpcServerUtils:107] - gRPC server at 9370 starting in insecure mode [rollsite-cfd4d6c88-bg9jm rollsite] [INFO ][2903][2021-12-09 06:04:04,248][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - server started at 9370 [client-5697c66d7b-mlx9g client] { [client-5697c66d7b-mlx9g client] "retcode": 0, [client-5697c66d7b-mlx9g client] "retmsg": "Fate Flow CLI has been initialized successfully." [client-5697c66d7b-mlx9g client] } [client-5697c66d7b-mlx9g client] [client-5697c66d7b-mlx9g client] Pipeline configuration succeeded. [client-5697c66d7b-mlx9g client] [D 06:04:04.005 NotebookApp] Searching ['/data/projects/fate', '/root/.jupyter', '/root/.local/etc/jupyter', '/usr/local/etc/jupyter', '/etc/jupyter'] for config files [client-5697c66d7b-mlx9g client] [D 06:04:04.006 NotebookApp] Looking for jupyter_config in /etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.006 NotebookApp] Looking for jupyter_config in /usr/local/etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.007 NotebookApp] Looking for jupyter_config in /root/.local/etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.008 NotebookApp] Looking for jupyter_config in /root/.jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.008 NotebookApp] Looking for jupyter_config in /data/projects/fate [client-5697c66d7b-mlx9g client] [D 06:04:04.011 NotebookApp] Looking for jupyter_notebook_config in /etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.014 NotebookApp] Looking for jupyter_notebook_config in /usr/local/etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.015 NotebookApp] Looking for jupyter_notebook_config in /root/.local/etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.015 NotebookApp] Looking for jupyter_notebook_config in /root/.jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.015 NotebookApp] Looking for jupyter_notebook_config in /data/projects/fate [client-5697c66d7b-mlx9g client] [D 06:04:04.041 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-5697c66d7b-mlx9g client] /etc/jupyter/jupyter_notebook_config.json [client-5697c66d7b-mlx9g client] [D 06:04:04.042 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-5697c66d7b-mlx9g client] /usr/local/etc/jupyter/jupyter_notebook_config.json [client-5697c66d7b-mlx9g client] [D 06:04:04.045 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-5697c66d7b-mlx9g client] /root/.local/etc/jupyter/jupyter_notebook_config.json [client-5697c66d7b-mlx9g client] [D 06:04:04.046 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-5697c66d7b-mlx9g client] /root/.jupyter/jupyter_notebook_config.json [client-5697c66d7b-mlx9g client] [I 06:04:04.053 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret [client-5697c66d7b-mlx9g client] [I 06:04:04.054 NotebookApp] Authentication of /metrics is OFF, since other authentication is disabled. [client-5697c66d7b-mlx9g client] [W 06:04:04.939 NotebookApp] All authentication is disabled. Anyone who can connect to this server will be able to run code. [client-5697c66d7b-mlx9g client] [I 06:04:04.951 NotebookApp] Serving notebooks from local directory: /data/projects/fate [client-5697c66d7b-mlx9g client] [I 06:04:04.951 NotebookApp] Jupyter Notebook 6.4.6 is running at: [client-5697c66d7b-mlx9g client] [I 06:04:04.951 NotebookApp] http://client-5697c66d7b-mlx9g:20000/ [client-5697c66d7b-mlx9g client] [I 06:04:04.952 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [clustermanager-fd479cb86-9fxgz clustermanager] + mkdir -p /data/projects/fate/eggroll/logs/eggroll/ [clustermanager-fd479cb86-9fxgz clustermanager] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [clustermanager-fd479cb86-9fxgz clustermanager] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [clustermanager-fd479cb86-9fxgz clustermanager] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [clustermanager-fd479cb86-9fxgz clustermanager] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [clustermanager-fd479cb86-9fxgz clustermanager] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [clustermanager-fd479cb86-9fxgz clustermanager] + ln -sf /dev/stderr /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [clustermanager-fd479cb86-9fxgz clustermanager] + java -Dlog4j.configurationFile=/data/projects/fate/eggroll//conf/log4j2.properties -cp '/data/projects/fate/eggroll//lib/:' com.webank.eggroll.core.Bootstrap --bootstraps com.webank.eggroll.core.resourcemanager.ClusterManagerBootstrap -c /data/projects/fate/eggroll//conf/eggroll.properties -p 4670 -s EGGROLL_DEAMON [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2267][2021-12-09 06:04:03,987][main,pid:1,tid:1][c.w.e.c.Bootstrap:107] - main started [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2441][2021-12-09 06:04:04,161][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getServerNode [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2444][2021-12-09 06:04:04,164][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getServerNodes [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2446][2021-12-09 06:04:04,166][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getOrCreateServerNode [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2449][2021-12-09 06:04:04,169][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/createOrUpdateServerNode [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2461][2021-12-09 06:04:04,181][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getStore [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2462][2021-12-09 06:04:04,182][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getOrCreateStore [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2463][2021-12-09 06:04:04,183][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/deleteStore [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2464][2021-12-09 06:04:04,184][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getStoreFromNamespace [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2478][2021-12-09 06:04:04,198][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/getSession [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2480][2021-12-09 06:04:04,200][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/getOrCreateSession [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2481][2021-12-09 06:04:04,201][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/stopSession [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2481][2021-12-09 06:04:04,201][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/killSession [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2482][2021-12-09 06:04:04,202][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/killAllSessions [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2483][2021-12-09 06:04:04,203][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/registerSession [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2494][2021-12-09 06:04:04,214][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/heartbeat [clustermanager-fd479cb86-9fxgz clustermanager] current dir: /data/projects/fate/eggroll/. [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2542][2021-12-09 06:04:04,262][main,pid:1,tid:1][c.w.e.c.r.ClusterManagerBootstrap:107] - conf file: /data/projects/fate/eggroll/conf/eggroll.properties [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2937][2021-12-09 06:04:04,657][main,pid:1,tid:1][c.w.e.c.t.GrpcServerUtils:107] - gRPC server at 4670 starting in insecure mode [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][3247][2021-12-09 06:04:04,967][main,pid:1,tid:1][c.w.e.c.r.ClusterManagerBootstrap:107] - server started at port 4670 [clustermanager-fd479cb86-9fxgz clustermanager] server started at port 4670 [mysql-85d85f56b9-6cs2c mysql] 2021-12-09 06:04:04+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.21-1debian10 started. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09 06:04:04+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' [mysql-85d85f56b9-6cs2c mysql] 2021-12-09 06:04:04+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.21-1debian10 started. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:05.229030Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.21) starting as process 1 [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:05.261348Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.208357Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.500870Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.656002Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.656309Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.661414Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.729856Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.21' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server - GPL. [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:13 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"

[nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:13 +0000 [warn]: define <match fluent.> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:13 +0000 [info]: using configuration file: [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] @type tail [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] path "/data/projects/fate/eggroll/logs//." [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] exclude_path ["/data/projects/fate/eggroll/logs/eggroll/","/data/projects/fate/eggroll/logs/log.pos"] [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] pos_file "/data/projects/fate/eggroll/logs/log.pos" [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] tag "eggroll" [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] multiline_flush_interval 2s [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] refresh_interval 5s [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] @type "none" [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] unmatched_lines [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] <match > [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] @type stdout [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:13 +0000 [info]: starting fluentd-1.12.2 pid=9 ruby="2.7.2" [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:13 +0000 [info]: spawn command to main: cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--under-supervisor"] [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:15 +0000 [info]: adding match pattern="" type="stdout" [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:16 +0000 [info]: adding source type="tail" [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:16 +0000 [warn]: #0 define <match fluent.> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:16 +0000 [info]: #0 starting fluentd worker pid=18 ppid=9 worker=0 [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:16 +0000 [info]: #0 fluentd worker is now running worker=0 [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:16.140620415 +0000 fluent.info: {"pid":18,"ppid":9,"worker":0,"message":"starting fluentd worker pid=18 ppid=9 worker=0"} [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:16.142334327 +0000 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"} [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] + mkdir -p /data/projects/fate/eggroll/logs/eggroll/ [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] + ln -sf /dev/stderr /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] + /tini -- java -Dlog4j.configurationFile=/data/projects/fate/eggroll//conf/log4j2.properties -cp '/data/projects/fate/eggroll//lib/*:' com.webank.eggroll.core.Bootstrap --bootstraps com.webank.eggroll.core.resourcemanager.NodeManagerBootstrap -c /data/projects/fate/eggroll//conf/eggroll.properties -p 4671 -s EGGROLL_DEAMON [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] [INFO ][1623][2021-12-09 06:04:13,205][main,pid:28,tid:1][c.w.e.c.Bootstrap:107] - main started [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] [INFO ][1757][2021-12-09 06:04:13,339][main,pid:28,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/startContainers [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] [INFO ][1758][2021-12-09 06:04:13,340][main,pid:28,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/stopContainers [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] [INFO ][1759][2021-12-09 06:04:13,341][main,pid:28,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/killContainers [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] [INFO ][1763][2021-12-09 06:04:13,345][main,pid:28,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/heartbeat [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] current dir: /data/projects/fate/eggroll/. [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] [INFO ][1777][2021-12-09 06:04:13,359][main,pid:28,tid:1][c.w.e.c.r.NodeManagerBootstrap:107] - conf file: /data/projects/fate/eggroll/conf/eggroll.properties [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] [INFO ][2097][2021-12-09 06:04:13,679][main,pid:28,tid:1][c.w.e.c.t.GrpcServerUtils:107] - gRPC server at 4671 starting in insecure mode [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] server started at 0 [nodemanager-0-77f55fc97d-rvc2m nodemanager-0] [INFO ][2312][2021-12-09 06:04:13,894][main,pid:28,tid:1][c.w.e.c.r.NodeManagerBootstrap:107] - server started at 0 [python-84d4dbd6df-2nbcj python] + mkdir -p /data/projects/fate/conf/ [python-84d4dbd6df-2nbcj python] + cp /data/projects/fate/conf-tmp/transfer_conf.yaml /data/projects/fate/conf/transfer_conf.yaml [python-84d4dbd6df-2nbcj python] + cp /data/projects/fate/conf-tmp/service_conf.yaml /data/projects/fate/conf/service_conf.yaml [python-84d4dbd6df-2nbcj python] + cp /data/projects/fate/conf-tmp/component_registry.json /data/projects/fate/conf/component_registry.json [python-84d4dbd6df-2nbcj python] + cp /data/projects/fate/conf-tmp/job_default_config.yaml /data/projects/fate/conf/job_default_config.yaml [python-84d4dbd6df-2nbcj python] + sed -i 's/host: fateflow/host: 10.244.0.72/g' /data/projects/fate/conf/service_conf.yaml [python-84d4dbd6df-2nbcj python] + cp /data/projects/spark-2.4.1-bin-hadoop2.7/conf/spark-defaults-template.conf /data/projects/spark-2.4.1-bin-hadoop2.7/conf/spark-defaults.conf [python-84d4dbd6df-2nbcj python] + sed -i s/fateflow/10.244.0.72/g /data/projects/spark-2.4.1-bin-hadoop2.7/conf/spark-defaults.conf [python-84d4dbd6df-2nbcj python] + sleep 5 [python-84d4dbd6df-2nbcj python] + python fateflow/python/fate_flow/fate_flow_server.py

tinywell commented 2 years ago

我也遇到同样的问题,不过我的日志中主要错误是 fateboard 访问 fateflow 9380 服务出错:

[python-749dc7f58f-6xt2k fateboard] 2022-01-07 09:03:43 INFO  [http-nio-8080-exec-8] (HttpClientPool.java:171) - httpclient sent url http://fateflow:9380/v1/job/stop request {"job_id":"0"} result: 
[python-749dc7f58f-6xt2k fateboard] 2022-01-07 09:03:43 ERROR [http-nio-8080-exec-8] (GlobalExceptionHandler.java:41) - error 
[python-749dc7f58f-6xt2k fateboard] java.lang.NullPointerException: null
[python-749dc7f58f-6xt2k fateboard]     at com.webank.ai.fate.board.controller.JobManagerController.checkAppKey(JobManagerController.java:292)
[python-749dc7f58f-6xt2k fateboard]     at com.webank.ai.fate.board.controller.JobManagerController.queryJobStatus(JobManagerController.java:73)
[python-749dc7f58f-6xt2k fateboard]     at sun.reflect.GeneratedMethodAccessor58.invoke(Unknown Source)
[python-749dc7f58f-6xt2k fateboard]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[python-749dc7f58f-6xt2k fateboard]     at java.lang.reflect.Method.invoke(Method.java:498)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:190)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:106)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:888)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:793)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1040)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:943)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
[python-749dc7f58f-6xt2k fateboard]     at javax.servlet.http.HttpServlet.service(HttpServlet.java:634)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
[python-749dc7f58f-6xt2k fateboard]     at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard]     at com.webank.ai.fate.board.conf.SecurityFilter.doFilter(SecurityFilter.java:39)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.session.web.http.SessionRepositoryFilter.doFilterInternal(SessionRepositoryFilter.java:141)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.session.web.http.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:82)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.doFilterInternal(WebMvcMetricsFilter.java:108)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
[python-749dc7f58f-6xt2k fateboard]     at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:526)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:747)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:861)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1579)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
[python-749dc7f58f-6xt2k fateboard]     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[python-749dc7f58f-6xt2k fateboard]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[python-749dc7f58f-6xt2k fateboard]     at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
[python-749dc7f58f-6xt2k fateboard]     at java.lang.Thread.run(Thread.java:748)

测试发现其他 pod 内(比如 clustermanager)可以正常访问 fateflow 9380 服务,就 fateboard 不行。

panii commented 2 years ago

也遇到这个错误 [root demo]# kubefate cluster ls UUID NAME NAMESPACE REVISION STATUS CHART ChartVERSION AGE
b93ac837-aaca-4040-9066-c33d6ca7f5f8 fate-9999 fate-9999 2 Running fate v1.7.1 4h7m bc62b003-dc79-48b1-9738-e586b62c04ad fate-10000 fate-10000 1 Running fate v1.7.1 3h27m

[root demo]# kubefate cluster describe b93ac837-aaca-4040-9066-c33d6ca7f5f8 the server could not find the requested resource

sunnycomes commented 2 years ago

我也遇到这个问题,这个问题到现在还没解决吗?以下是我的log

[mysql-c88469467-j57b7 mysql] 2022-03-02 07:05:01+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.28-1debian10 started. [mysql-c88469467-j57b7 mysql] 2022-03-02 07:05:02+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' [mysql-c88469467-j57b7 mysql] 2022-03-02 07:05:02+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.28-1debian10 started. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:02.713588Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.28) starting as process 1 [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:02.849294Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:06.559371Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:08.316051Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:08.316096Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:08.477725Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:08.719651Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:08.719852Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.28' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server - GPL. [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:03 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"

[nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:03 +0000 [warn]: define <match fluent.> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:03 +0000 [info]: using configuration file: [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] @type tail [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] path "/data/projects/fate/eggroll/logs//." [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] exclude_path ["/data/projects/fate/eggroll/logs/eggroll/","/data/projects/fate/eggroll/logs/log.pos"] [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] pos_file "/data/projects/fate/eggroll/logs/log.pos" [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] tag "eggroll" [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] multiline_flush_interval 2s [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] refresh_interval 5s [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] @type "none" [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] unmatched_lines [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] <match > [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] @type stdout [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:03 +0000 [info]: starting fluentd-1.12.2 pid=9 ruby="2.7.2" [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:03 +0000 [info]: spawn command to main: cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--under-supervisor"] [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:04 +0000 [info]: adding match pattern="" type="stdout" [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:04 +0000 [info]: adding source type="tail" [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:04 +0000 [warn]: #0 define <match fluent.> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:04 +0000 [info]: #0 starting fluentd worker pid=18 ppid=9 worker=0 [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:04 +0000 [info]: #0 fluentd worker is now running worker=0 [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:04.842618793 +0000 fluent.info: {"pid":18,"ppid":9,"worker":0,"message":"starting fluentd worker pid=18 ppid=9 worker=0"} [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:04.843645805 +0000 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"} [rollsite-5cf745f5b4-lm4xn rollsite] + mkdir -p /data/projects/fate/eggroll/logs/eggroll/ [rollsite-5cf745f5b4-lm4xn rollsite] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [rollsite-5cf745f5b4-lm4xn rollsite] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [rollsite-5cf745f5b4-lm4xn rollsite] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [rollsite-5cf745f5b4-lm4xn rollsite] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [rollsite-5cf745f5b4-lm4xn rollsite] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [rollsite-5cf745f5b4-lm4xn rollsite] + ln -sf /dev/stderr /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [rollsite-5cf745f5b4-lm4xn rollsite] + java -Dlog4j.configurationFile=/data/projects/fate/eggroll//conf/log4j2.properties -cp '/data/projects/fate/eggroll//lib/:/data/projects/fate/eggroll//conf/' com.webank.eggroll.rollsite.EggSiteBootstrap -c /data/projects/fate/eggroll//conf/eggroll.properties [rollsite-5cf745f5b4-lm4xn rollsite] current dir: /data/projects/fate/eggroll/. [rollsite-5cf745f5b4-lm4xn rollsite] [INFO ][5590][2022-03-02 07:05:12,300][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - conf file: /data/projects/fate/eggroll/conf/eggroll.properties [rollsite-5cf745f5b4-lm4xn rollsite] [INFO ][5692][2022-03-02 07:05:12,402][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - initing router at path=conf/route_table/route_table.json [rollsite-5cf745f5b4-lm4xn rollsite] [INFO ][5724][2022-03-02 07:05:12,434][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - start refreshing route table per min [rollsite-5cf745f5b4-lm4xn rollsite] [INFO ][6413][2022-03-02 07:05:13,123][main,pid:1,tid:1][c.w.e.c.t.GrpcServerUtils:107] - gRPC server at 9370 starting in insecure mode [rollsite-5cf745f5b4-lm4xn rollsite] [INFO ][6649][2022-03-02 07:05:13,359][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - server started at 9370 [client-8594c544d7-shgj7 client] { [client-8594c544d7-shgj7 client] "retcode": 0, [client-8594c544d7-shgj7 client] "retmsg": "Fate Flow CLI has been initialized successfully." [client-8594c544d7-shgj7 client] } [client-8594c544d7-shgj7 client] [client-8594c544d7-shgj7 client] Pipeline configuration succeeded. [client-8594c544d7-shgj7 client] [D 07:05:09.461 NotebookApp] Searching ['/data/projects/fate', '/root/.jupyter', '/root/.local/etc/jupyter', '/usr/local/etc/jupyter', '/etc/jupyter'] for config files [client-8594c544d7-shgj7 client] [D 07:05:09.461 NotebookApp] Looking for jupyter_config in /etc/jupyter [client-8594c544d7-shgj7 client] [D 07:05:09.461 NotebookApp] Looking for jupyter_config in /usr/local/etc/jupyter [client-8594c544d7-shgj7 client] [D 07:05:09.461 NotebookApp] Looking for jupyter_config in /root/.local/etc/jupyter [client-8594c544d7-shgj7 client] [D 07:05:09.462 NotebookApp] Looking for jupyter_config in /root/.jupyter [client-8594c544d7-shgj7 client] [D 07:05:09.462 NotebookApp] Looking for jupyter_config in /data/projects/fate [client-8594c544d7-shgj7 client] [D 07:05:09.462 NotebookApp] Looking for jupyter_notebook_config in /etc/jupyter [client-8594c544d7-shgj7 client] [D 07:05:09.463 NotebookApp] Looking for jupyter_notebook_config in /usr/local/etc/jupyter [client-8594c544d7-shgj7 client] [D 07:05:09.463 NotebookApp] Looking for jupyter_notebook_config in /root/.local/etc/jupyter [client-8594c544d7-shgj7 client] [D 07:05:09.463 NotebookApp] Looking for jupyter_notebook_config in /root/.jupyter [client-8594c544d7-shgj7 client] [D 07:05:09.463 NotebookApp] Looking for jupyter_notebook_config in /data/projects/fate [client-8594c544d7-shgj7 client] [D 07:05:09.467 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-8594c544d7-shgj7 client] /etc/jupyter/jupyter_notebook_config.json [client-8594c544d7-shgj7 client] [D 07:05:09.467 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-8594c544d7-shgj7 client] /usr/local/etc/jupyter/jupyter_notebook_config.json [client-8594c544d7-shgj7 client] [D 07:05:09.468 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-8594c544d7-shgj7 client] /root/.local/etc/jupyter/jupyter_notebook_config.json [client-8594c544d7-shgj7 client] [D 07:05:09.468 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-8594c544d7-shgj7 client] /root/.jupyter/jupyter_notebook_config.json [client-8594c544d7-shgj7 client] [I 07:05:09.470 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret [client-8594c544d7-shgj7 client] [I 07:05:09.470 NotebookApp] Authentication of /metrics is OFF, since other authentication is disabled. [client-8594c544d7-shgj7 client] [W 07:05:09.689 NotebookApp] All authentication is disabled. Anyone who can connect to this server will be able to run code. [client-8594c544d7-shgj7 client] [I 07:05:09.691 NotebookApp] Serving notebooks from local directory: /data/projects/fate [client-8594c544d7-shgj7 client] [I 07:05:09.691 NotebookApp] Jupyter Notebook 6.4.6 is running at: [client-8594c544d7-shgj7 client] [I 07:05:09.691 NotebookApp] http://client-8594c544d7-shgj7:20000/ [client-8594c544d7-shgj7 client] [I 07:05:09.691 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [clustermanager-f648564b6-xstsw clustermanager] + mkdir -p /data/projects/fate/eggroll/logs/eggroll/ [clustermanager-f648564b6-xstsw clustermanager] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [clustermanager-f648564b6-xstsw clustermanager] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [clustermanager-f648564b6-xstsw clustermanager] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [clustermanager-f648564b6-xstsw clustermanager] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [clustermanager-f648564b6-xstsw clustermanager] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [clustermanager-f648564b6-xstsw clustermanager] + ln -sf /dev/stderr /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [clustermanager-f648564b6-xstsw clustermanager] + java -Dlog4j.configurationFile=/data/projects/fate/eggroll//conf/log4j2.properties -cp '/data/projects/fate/eggroll//lib/:' com.webank.eggroll.core.Bootstrap --bootstraps com.webank.eggroll.core.resourcemanager.ClusterManagerBootstrap -c /data/projects/fate/eggroll//conf/eggroll.properties -p 4670 -s EGGROLLDEAMON [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4681][2022-03-02 07:05:12,281][main,pid:1,tid:1][c.w.e.c.Bootstrap:107] - main started [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4926][2022-03-02 07:05:12,526][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getServerNode [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4926][2022-03-02 07:05:12,526][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getServerNodes [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4929][2022-03-02 07:05:12,529][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getOrCreateServerNode [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4929][2022-03-02 07:05:12,529][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/createOrUpdateServerNode [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4945][2022-03-02 07:05:12,545][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getStore [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4945][2022-03-02 07:05:12,545][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getOrCreateStore [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4946][2022-03-02 07:05:12,546][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/deleteStore [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4946][2022-03-02 07:05:12,546][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getStoreFromNamespace [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4952][2022-03-02 07:05:12,552][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/getSession [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4952][2022-03-02 07:05:12,552][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/getOrCreateSession [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4953][2022-03-02 07:05:12,553][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/stopSession [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4954][2022-03-02 07:05:12,554][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/killSession [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4954][2022-03-02 07:05:12,554][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/killAllSessions [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4955][2022-03-02 07:05:12,555][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/registerSession [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4957][2022-03-02 07:05:12,557][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/heartbeat [clustermanager-f648564b6-xstsw clustermanager] current dir: /data/projects/fate/eggroll/. [clustermanager-f648564b6-xstsw clustermanager] [INFO ][4964][2022-03-02 07:05:12,564][main,pid:1,tid:1][c.w.e.c.r.ClusterManagerBootstrap:107] - conf file: /data/projects/fate/eggroll/conf/eggroll.properties [clustermanager-f648564b6-xstsw clustermanager] [INFO ][5596][2022-03-02 07:05:13,196][main,pid:1,tid:1][c.w.e.c.t.GrpcServerUtils:107] - gRPC server at 4670 starting in insecure mode [clustermanager-f648564b6-xstsw clustermanager] [INFO ][5771][2022-03-02 07:05:13,371][main,pid:1,tid:1][c.w.e.c.r.ClusterManagerBootstrap:107] - server started at port 4670 [clustermanager-f648564b6-xstsw clustermanager] server started at port 4670 [python-7c84997799-gqgqj fateboard] [python-7c84997799-gqgqj fateboard] . ____ [python-7c84997799-gqgqj fateboard] /\ / __' () _ \ \ \ \ [python-7c84997799-gqgqj fateboard] ( ( )__ | ' | '| | ' \/ _` | \ \ \ \ [python-7c84997799-gqgqj fateboard] \/ _)| |)| | | | | || (| | ) ) ) ) [python-7c84997799-gqgqj fateboard] ' |__| ._|| ||| |_, | / / / / [python-7c84997799-gqgqj fateboard] =========|_|==============|__/=//// [python-7c84997799-gqgqj fateboard] :: Spring Boot :: (v2.2.0.RELEASE) [python-7c84997799-gqgqj fateboard] [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:17 INFO [main] (StartupInfoLogger.java:55) - Starting Bootstrap on python-7c84997799-gqgqj with PID 1 (/data/projects/fate/fateboard/fateboard-1.7.2.jar started by root in /data/projects/fate/fateboard) [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:17 INFO [main] (SpringApplication.java:651) - No active profile set, falling back to default profiles: default [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:18 WARN [main] (ClassPathMapperScanner.java:239) - Skipping MapperFactoryBean with name 'jobMapper' and 'com.webank.ai.fate.board.dao.JobMapper' mapperInterface. Bean already defined with the same name! [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:18 WARN [main] (ClassPathMapperScanner.java:239) - Skipping MapperFactoryBean with name 'taskMapper' and 'com.webank.ai.fate.board.dao.TaskMapper' mapperInterface. Bean already defined with the same name! [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:18 WARN [main] (ClassPathMapperScanner.java:166) - No MyBatis mapper was found in '[com/webank/ai/fate/board/dao]' package. Please check your configuration. [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:18 INFO [main] (PostProcessorRegistrationDelegate.java:330) - Bean 'org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration' of type [org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying) [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:19 INFO [main] (TomcatWebServer.java:92) - Tomcat initialized with port(s): 8080 (http) [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:19 INFO [main] (DirectJDKLog.java:173) - Initializing ProtocolHandler ["http-nio-8080"] [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:19 INFO [main] (DirectJDKLog.java:173) - Starting service [Tomcat] [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:19 INFO [main] (DirectJDKLog.java:173) - Starting Servlet engine: [Apache Tomcat/9.0.27] [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:19 INFO [main] (DirectJDKLog.java:173) - Initializing Spring embedded WebApplicationContext [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:19 INFO [main] (ServletWebServerApplicationContext.java:284) - Root WebApplicationContext: initialization completed in 1969 ms [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:19 INFO [main] (HikariDataSource.java:110) - HikariPool-1 - Starting... [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:20 INFO [main] (HikariDataSource.java:123) - HikariPool-1 - Start completed. [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:21 INFO [main] (ExecutorConfigurationSupport.java:171) - Initializing ExecutorService [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:21 INFO [main] (ExecutorConfigurationSupport.java:171) - Initializing ExecutorService 'asyncServiceExecutor' [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:21 ERROR [main] (SshService.java:143) - load ssh config file error [python-7c84997799-gqgqj fateboard] java.lang.IllegalArgumentException: null [python-7c84997799-gqgqj fateboard] at com.google.common.base.Preconditions.checkArgument(Preconditions.java:128) [python-7c84997799-gqgqj fateboard] at com.webank.ai.fate.board.ssh.SshService.afterPropertiesSet(SshService.java:138) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1862) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1799) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:595) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1287) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1207) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:636) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:116) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:397) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1429) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:594) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1287) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1207) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:636) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:116) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:397) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1429) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:594) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [python-7c84997799-gqgqj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:879) [python-7c84997799-gqgqj fateboard] at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:878) [python-7c84997799-gqgqj fateboard] at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:550) [python-7c84997799-gqgqj fateboard] at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:141) [python-7c84997799-gqgqj fateboard] at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:747) [python-7c84997799-gqgqj fateboard] at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:397) [python-7c84997799-gqgqj fateboard] at org.springframework.boot.SpringApplication.run(SpringApplication.java:315) [python-7c84997799-gqgqj fateboard] at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226) [python-7c84997799-gqgqj fateboard] at org.springframework.boot.SpringApplication.run(SpringApplication.java:1215) [python-7c84997799-gqgqj fateboard] at com.webank.ai.fate.board.bootstrap.Bootstrap.main(Bootstrap.java:49) [python-7c84997799-gqgqj fateboard] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [python-7c84997799-gqgqj fateboard] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [python-7c84997799-gqgqj fateboard] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [python-7c84997799-gqgqj fateboard] at java.lang.reflect.Method.invoke(Method.java:498) [python-7c84997799-gqgqj fateboard] at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) [python-7c84997799-gqgqj fateboard] at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) [python-7c84997799-gqgqj fateboard] at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) [python-7c84997799-gqgqj fateboard] at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51) [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:21 INFO [main] (SshConfigFileWatcher.java:165) - use system path /data/projects/fate/fateboard/conf [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:21 INFO [main] (SshConfigFileWatcher.java:171) - Scanning /data/projects/fate/fateboard/conf ... [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:22 INFO [main] (Version.java:21) - HV000001: Hibernate Validator 6.0.17.Final [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:22 INFO [main] (WelcomePageHandlerMapping.java:54) - Adding welcome page: class path resource [static/index.html] [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:23 INFO [main] (EndpointLinksResolver.java:58) - Exposing 0 endpoint(s) beneath base path '/actuator' [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:23 INFO [main] (ScheduledAnnotationBeanPostProcessor.java:297) - No TaskScheduler/ScheduledExecutorService bean found for scheduled processing [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:23 INFO [main] (DirectJDKLog.java:173) - Starting ProtocolHandler ["http-nio-8080"] [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:24 INFO [main] (TomcatWebServer.java:204) - Tomcat started on port(s): 8080 (http) with context path '' [python-7c84997799-gqgqj fateboard] 2022-03-02 07:05:24 INFO [main] (StartupInfoLogger.java:61) - Started Bootstrap in 7.338 seconds (JVM running for 7.965) [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:02 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"

[nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:03 +0000 [warn]: define <match fluent.> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:03 +0000 [info]: using configuration file: [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] @type tail [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] path "/data/projects/fate/eggroll/logs//." [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] exclude_path ["/data/projects/fate/eggroll/logs/eggroll/","/data/projects/fate/eggroll/logs/log.pos"] [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] pos_file "/data/projects/fate/eggroll/logs/log.pos" [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] tag "eggroll" [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] multiline_flush_interval 2s [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] refresh_interval 5s [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] @type "none" [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] unmatched_lines [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] <match > [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] @type stdout [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:03 +0000 [info]: starting fluentd-1.12.2 pid=7 ruby="2.7.2" [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:03 +0000 [info]: spawn command to main: cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--under-supervisor"] [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:04 +0000 [info]: adding match pattern="" type="stdout" [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:04 +0000 [info]: adding source type="tail" [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:04 +0000 [warn]: #0 define <match fluent.> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:04 +0000 [info]: #0 starting fluentd worker pid=16 ppid=7 worker=0 [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:04 +0000 [info]: #0 fluentd worker is now running worker=0 [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:04.831506714 +0000 fluent.info: {"pid":16,"ppid":7,"worker":0,"message":"starting fluentd worker pid=16 ppid=7 worker=0"} [nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:04.832032820 +0000 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"} [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] + mkdir -p /data/projects/fate/eggroll/logs/eggroll/ [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] + ln -sf /dev/stderr /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] + /tini -- java -Dlog4j.configurationFile=/data/projects/fate/eggroll//conf/log4j2.properties -cp '/data/projects/fate/eggroll//lib/:' com.webank.eggroll.core.Bootstrap --bootstraps com.webank.eggroll.core.resourcemanager.NodeManagerBootstrap -c /data/projects/fate/eggroll//conf/eggroll.properties -p 4671 -s EGGROLL_DEAMON [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] [INFO ][5427][2022-03-02 07:05:12,261][main,pid:27,tid:1][c.w.e.c.Bootstrap:107] - main started [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] [INFO ][5507][2022-03-02 07:05:12,341][main,pid:27,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/startContainers [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] [INFO ][5507][2022-03-02 07:05:12,341][main,pid:27,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/stopContainers [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] [INFO ][5510][2022-03-02 07:05:12,344][main,pid:27,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/killContainers [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] [INFO ][5513][2022-03-02 07:05:12,347][main,pid:27,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/heartbeat [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] current dir: /data/projects/fate/eggroll/. [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] [INFO ][5519][2022-03-02 07:05:12,353][main,pid:27,tid:1][c.w.e.c.r.NodeManagerBootstrap:107] - conf file: /data/projects/fate/eggroll/conf/eggroll.properties [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] [INFO ][6325][2022-03-02 07:05:13,159][main,pid:27,tid:1][c.w.e.c.t.GrpcServerUtils:107] - gRPC server at 4671 starting in insecure mode [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] server started at 0 [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1] [INFO ][6534][2022-03-02 07:05:13,368][main,pid:27,tid:1][c.w.e.c.r.NodeManagerBootstrap:107] - server started at 0 [python-7c84997799-gqgqj python] + mkdir -p /data/projects/fate/conf/ [python-7c84997799-gqgqj python] + cp /data/projects/fate/conf-tmp/transfer_conf.yaml /data/projects/fate/conf/transfer_conf.yaml [python-7c84997799-gqgqj python] + cp /data/projects/fate/conf-tmp/service_conf.yaml /data/projects/fate/conf/service_conf.yaml [python-7c84997799-gqgqj python] + cp /data/projects/fate/conf-tmp/component_registry.json /data/projects/fate/conf/component_registry.json [python-7c84997799-gqgqj python] + cp /data/projects/fate/conf-tmp/job_default_config.yaml /data/projects/fate/conf/job_default_config.yaml [python-7c84997799-gqgqj python] + sed -i 's/host: fateflow/host: 172.17.0.10/g' /data/projects/fate/conf/service_conf.yaml [python-7c84997799-gqgqj python] + cp /data/projects/spark-2.4.1-bin-hadoop2.7/conf/spark-defaults-template.conf /data/projects/spark-2.4.1-bin-hadoop2.7/conf/spark-defaults.conf [python-7c84997799-gqgqj python] + sed -i s/fateflow/172.17.0.10/g /data/projects/spark-2.4.1-bin-hadoop2.7/conf/spark-defaults.conf [python-7c84997799-gqgqj python] + sleep 5 [python-7c84997799-gqgqj python] + python fateflow/python/fate_flow/fate_flow_server.py [python-7c84997799-gqgqj python] FATE Flow grpc server start successfully [python-7c84997799-gqgqj python] FATE Flow http server start... [nodemanager-0-6d88c65cb-64skg nodemanager-0] + mkdir -p /data/projects/fate/eggroll/logs/eggroll/ [nodemanager-0-6d88c65cb-64skg nodemanager-0] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [nodemanager-0-6d88c65cb-64skg nodemanager-0] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [nodemanager-0-6d88c65cb-64skg nodemanager-0] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [nodemanager-0-6d88c65cb-64skg nodemanager-0] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [nodemanager-0-6d88c65cb-64skg nodemanager-0] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [nodemanager-0-6d88c65cb-64skg nodemanager-0] + ln -sf /dev/stderr /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [nodemanager-0-6d88c65cb-64skg nodemanager-0] + /tini -- java -Dlog4j.configurationFile=/data/projects/fate/eggroll//conf/log4j2.properties -cp '/data/projects/fate/eggroll//lib/:' com.webank.eggroll.core.Bootstrap --bootstraps com.webank.eggroll.core.resourcemanager.NodeManagerBootstrap -c /data/projects/fate/eggroll//conf/eggroll.properties -p 4671 -s EGGROLL_DEAMON [nodemanager-0-6d88c65cb-64skg nodemanager-0] [INFO ][5434][2022-03-02 07:05:12,282][main,pid:26,tid:1][c.w.e.c.Bootstrap:107] - main started [nodemanager-0-6d88c65cb-64skg nodemanager-0] [INFO ][5513][2022-03-02 07:05:12,361][main,pid:26,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/startContainers [nodemanager-0-6d88c65cb-64skg nodemanager-0] [INFO ][5514][2022-03-02 07:05:12,362][main,pid:26,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/stopContainers [nodemanager-0-6d88c65cb-64skg nodemanager-0] [INFO ][5517][2022-03-02 07:05:12,365][main,pid:26,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/killContainers [nodemanager-0-6d88c65cb-64skg nodemanager-0] [INFO ][5519][2022-03-02 07:05:12,367][main,pid:26,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/node-manager/processor/heartbeat [nodemanager-0-6d88c65cb-64skg nodemanager-0] current dir: /data/projects/fate/eggroll/. [nodemanager-0-6d88c65cb-64skg nodemanager-0] [INFO ][5530][2022-03-02 07:05:12,378][main,pid:26,tid:1][c.w.e.c.r.NodeManagerBootstrap:107] - conf file: /data/projects/fate/eggroll/conf/eggroll.properties [nodemanager-0-6d88c65cb-64skg nodemanager-0] [INFO ][6361][2022-03-02 07:05:13,209][main,pid:26,tid:1][c.w.e.c.t.GrpcServerUtils:107] - gRPC server at 4671 starting in insecure mode [nodemanager-0-6d88c65cb-64skg nodemanager-0] server started at 0 [nodemanager-0-6d88c65cb-64skg nodemanager-0] [INFO ][6529][2022-03-02 07:05:13,377][main,pid:26,tid:1][c.w.e.c.r.NodeManagerBootstrap:107] - server started at 0

owlet42 commented 2 years ago

Check the version of cli and service through kubefate version, you need to ensure that the two are consistent.

szshary commented 2 years ago

Check the version of cli and service through kubefate version, you need to ensure that the two are consistent.

确认过版本一致,问题一直存在

owlet42 commented 2 years ago

This bug will appear in k8s v1.22+

owlet42 commented 2 years ago

A temporary solution is to use k8s <=v1.21

LaynePeng commented 2 years ago

Convert to task: to support k8s > v1.2.1

jeanP-zhang commented 2 years ago

A temporary solution is to use k8s <=v1.21

2022年3月22日,使用minikube搭建集群,该问题依旧存在