Closed liuyanntes closed 2 years ago
cluster日志:
root@crms-10-10-178-147[/k8s/fate]# kubefate cluster logs 4f14f1ca-687c-49cf-a1b7-f9a48e7b37bd [python-84d4dbd6df-2nbcj fateboard] [python-84d4dbd6df-2nbcj fateboard] . _ _ [python-84d4dbd6df-2nbcj fateboard] /\ / '_ () \ \ \ \ [python-84d4dbd6df-2nbcj fateboard] ( ( )\ | ' | '| | ' \/ ` | \ \ \ \ [python-84d4dbd6df-2nbcj fateboard] \/ _)| |)| | | | | || (| | ) ) ) ) [python-84d4dbd6df-2nbcj fateboard] ' |__| ._|| ||| |_, | / / / / [python-84d4dbd6df-2nbcj fateboard] =========|_|==============|__/=//// [python-84d4dbd6df-2nbcj fateboard] :: Spring Boot :: (v2.2.0.RELEASE) [python-84d4dbd6df-2nbcj fateboard] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:13 INFO [main] (StartupInfoLogger.java:55) - Starting Bootstrap on python-84d4dbd6df-2nbcj with PID 1 (/data/projects/fate/fateboard/fateboard-1.7.0.jar started by root in /data/projects/fate/fateboard) [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:13 INFO [main] (SpringApplication.java:651) - No active profile set, falling back to default profiles: default [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:16 WARN [main] (ClassPathMapperScanner.java:239) - Skipping MapperFactoryBean with name 'jobMapper' and 'com.webank.ai.fate.board.dao.JobMapper' mapperInterface. Bean already defined with the same name! [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:16 WARN [main] (ClassPathMapperScanner.java:239) - Skipping MapperFactoryBean with name 'taskMapper' and 'com.webank.ai.fate.board.dao.TaskMapper' mapperInterface. Bean already defined with the same name! [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:16 WARN [main] (ClassPathMapperScanner.java:166) - No MyBatis mapper was found in '[com/webank/ai/fate/board/dao]' package. Please check your configuration. [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:17 INFO [main] (PostProcessorRegistrationDelegate.java:330) - Bean 'org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration' of type [org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying) [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (TomcatWebServer.java:92) - Tomcat initialized with port(s): 8080 (http) [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (DirectJDKLog.java:173) - Initializing ProtocolHandler ["http-nio-8080"] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (DirectJDKLog.java:173) - Starting service [Tomcat] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (DirectJDKLog.java:173) - Starting Servlet engine: [Apache Tomcat/9.0.27] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (DirectJDKLog.java:173) - Initializing Spring embedded WebApplicationContext [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (ServletWebServerApplicationContext.java:284) - Root WebApplicationContext: initialization completed in 4221 ms [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:18 INFO [main] (HikariDataSource.java:110) - HikariPool-1 - Starting... [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:19 INFO [main] (HikariDataSource.java:123) - HikariPool-1 - Start completed. [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:20 ERROR [main] (SshService.java:143) - load ssh config file error [python-84d4dbd6df-2nbcj fateboard] java.lang.IllegalArgumentException: null [python-84d4dbd6df-2nbcj fateboard] at com.google.common.base.Preconditions.checkArgument(Preconditions.java:128) [python-84d4dbd6df-2nbcj fateboard] at com.webank.ai.fate.board.ssh.SshService.afterPropertiesSet(SshService.java:138) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1862) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1799) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:595) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1287) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1207) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:636) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:116) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:397) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1429) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:594) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1287) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1207) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:636) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:116) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:397) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1429) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:594) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:879) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:878) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:550) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:141) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:747) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:397) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.SpringApplication.run(SpringApplication.java:315) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.SpringApplication.run(SpringApplication.java:1215) [python-84d4dbd6df-2nbcj fateboard] at com.webank.ai.fate.board.bootstrap.Bootstrap.main(Bootstrap.java:49) [python-84d4dbd6df-2nbcj fateboard] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [python-84d4dbd6df-2nbcj fateboard] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [python-84d4dbd6df-2nbcj fateboard] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [python-84d4dbd6df-2nbcj fateboard] at java.lang.reflect.Method.invoke(Method.java:498) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) [python-84d4dbd6df-2nbcj fateboard] at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51) [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:20 INFO [main] (ExecutorConfigurationSupport.java:171) - Initializing ExecutorService [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:20 INFO [main] (ExecutorConfigurationSupport.java:171) - Initializing ExecutorService 'asyncServiceExecutor' [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:21 INFO [main] (SshConfigFileWatcher.java:165) - use system path /data/projects/fate/fateboard/conf [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:21 INFO [main] (SshConfigFileWatcher.java:171) - Scanning /data/projects/fate/fateboard/conf ... [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:21 INFO [main] (Version.java:21) - HV000001: Hibernate Validator 6.0.17.Final [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:22 INFO [main] (WelcomePageHandlerMapping.java:54) - Adding welcome page: class path resource [static/index.html] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:23 INFO [main] (EndpointLinksResolver.java:58) - Exposing 0 endpoint(s) beneath base path '/actuator' [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:23 INFO [main] (ScheduledAnnotationBeanPostProcessor.java:297) - No TaskScheduler/ScheduledExecutorService bean found for scheduled processing [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:23 INFO [main] (DirectJDKLog.java:173) - Starting ProtocolHandler ["http-nio-8080"] [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:23 INFO [main] (TomcatWebServer.java:204) - Tomcat started on port(s): 8080 (http) with context path '' [python-84d4dbd6df-2nbcj fateboard] 2021-12-09 06:04:23 INFO [main] (StartupInfoLogger.java:61) - Started Bootstrap in 10.617 seconds (JVM running for 12.234) [rollsite-cfd4d6c88-bg9jm rollsite] + mkdir -p /data/projects/fate/eggroll/logs/eggroll/ [rollsite-cfd4d6c88-bg9jm rollsite] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [rollsite-cfd4d6c88-bg9jm rollsite] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [rollsite-cfd4d6c88-bg9jm rollsite] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [rollsite-cfd4d6c88-bg9jm rollsite] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [rollsite-cfd4d6c88-bg9jm rollsite] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [rollsite-cfd4d6c88-bg9jm rollsite] + ln -sf /dev/stderr /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [rollsite-cfd4d6c88-bg9jm rollsite] + java -Dlog4j.configurationFile=/data/projects/fate/eggroll//conf/log4j2.properties -cp '/data/projects/fate/eggroll//lib/:/data/projects/fate/eggroll//conf/' com.webank.eggroll.rollsite.EggSiteBootstrap -c /data/projects/fate/eggroll//conf/eggroll.properties [rollsite-cfd4d6c88-bg9jm rollsite] current dir: /data/projects/fate/eggroll/. [rollsite-cfd4d6c88-bg9jm rollsite] [INFO ][1876][2021-12-09 06:04:03,221][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - conf file: /data/projects/fate/eggroll/conf/eggroll.properties [rollsite-cfd4d6c88-bg9jm rollsite] [INFO ][1904][2021-12-09 06:04:03,249][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - initing router at path=conf/route_table/route_table.json [rollsite-cfd4d6c88-bg9jm rollsite] [INFO ][1958][2021-12-09 06:04:03,303][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - start refreshing route table per min [rollsite-cfd4d6c88-bg9jm rollsite] [INFO ][2612][2021-12-09 06:04:03,957][main,pid:1,tid:1][c.w.e.c.t.GrpcServerUtils:107] - gRPC server at 9370 starting in insecure mode [rollsite-cfd4d6c88-bg9jm rollsite] [INFO ][2903][2021-12-09 06:04:04,248][main,pid:1,tid:1][c.w.e.r.EggSiteBootstrap:107] - server started at 9370 [client-5697c66d7b-mlx9g client] { [client-5697c66d7b-mlx9g client] "retcode": 0, [client-5697c66d7b-mlx9g client] "retmsg": "Fate Flow CLI has been initialized successfully." [client-5697c66d7b-mlx9g client] } [client-5697c66d7b-mlx9g client] [client-5697c66d7b-mlx9g client] Pipeline configuration succeeded. [client-5697c66d7b-mlx9g client] [D 06:04:04.005 NotebookApp] Searching ['/data/projects/fate', '/root/.jupyter', '/root/.local/etc/jupyter', '/usr/local/etc/jupyter', '/etc/jupyter'] for config files [client-5697c66d7b-mlx9g client] [D 06:04:04.006 NotebookApp] Looking for jupyter_config in /etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.006 NotebookApp] Looking for jupyter_config in /usr/local/etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.007 NotebookApp] Looking for jupyter_config in /root/.local/etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.008 NotebookApp] Looking for jupyter_config in /root/.jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.008 NotebookApp] Looking for jupyter_config in /data/projects/fate [client-5697c66d7b-mlx9g client] [D 06:04:04.011 NotebookApp] Looking for jupyter_notebook_config in /etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.014 NotebookApp] Looking for jupyter_notebook_config in /usr/local/etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.015 NotebookApp] Looking for jupyter_notebook_config in /root/.local/etc/jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.015 NotebookApp] Looking for jupyter_notebook_config in /root/.jupyter [client-5697c66d7b-mlx9g client] [D 06:04:04.015 NotebookApp] Looking for jupyter_notebook_config in /data/projects/fate [client-5697c66d7b-mlx9g client] [D 06:04:04.041 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-5697c66d7b-mlx9g client] /etc/jupyter/jupyter_notebook_config.json [client-5697c66d7b-mlx9g client] [D 06:04:04.042 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-5697c66d7b-mlx9g client] /usr/local/etc/jupyter/jupyter_notebook_config.json [client-5697c66d7b-mlx9g client] [D 06:04:04.045 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-5697c66d7b-mlx9g client] /root/.local/etc/jupyter/jupyter_notebook_config.json [client-5697c66d7b-mlx9g client] [D 06:04:04.046 NotebookApp] Paths used for configuration of jupyter_notebook_config: [client-5697c66d7b-mlx9g client] /root/.jupyter/jupyter_notebook_config.json [client-5697c66d7b-mlx9g client] [I 06:04:04.053 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret [client-5697c66d7b-mlx9g client] [I 06:04:04.054 NotebookApp] Authentication of /metrics is OFF, since other authentication is disabled. [client-5697c66d7b-mlx9g client] [W 06:04:04.939 NotebookApp] All authentication is disabled. Anyone who can connect to this server will be able to run code. [client-5697c66d7b-mlx9g client] [I 06:04:04.951 NotebookApp] Serving notebooks from local directory: /data/projects/fate [client-5697c66d7b-mlx9g client] [I 06:04:04.951 NotebookApp] Jupyter Notebook 6.4.6 is running at: [client-5697c66d7b-mlx9g client] [I 06:04:04.951 NotebookApp] http://client-5697c66d7b-mlx9g:20000/ [client-5697c66d7b-mlx9g client] [I 06:04:04.952 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [clustermanager-fd479cb86-9fxgz clustermanager] + mkdir -p /data/projects/fate/eggroll/logs/eggroll/ [clustermanager-fd479cb86-9fxgz clustermanager] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [clustermanager-fd479cb86-9fxgz clustermanager] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll-audit.log [clustermanager-fd479cb86-9fxgz clustermanager] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [clustermanager-fd479cb86-9fxgz clustermanager] + ln -sf /dev/stdout /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.log [clustermanager-fd479cb86-9fxgz clustermanager] + touch /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [clustermanager-fd479cb86-9fxgz clustermanager] + ln -sf /dev/stderr /data/projects/fate/eggroll/logs/eggroll/eggroll.jvm.err.log [clustermanager-fd479cb86-9fxgz clustermanager] + java -Dlog4j.configurationFile=/data/projects/fate/eggroll//conf/log4j2.properties -cp '/data/projects/fate/eggroll//lib/:' com.webank.eggroll.core.Bootstrap --bootstraps com.webank.eggroll.core.resourcemanager.ClusterManagerBootstrap -c /data/projects/fate/eggroll//conf/eggroll.properties -p 4670 -s EGGROLL_DEAMON [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2267][2021-12-09 06:04:03,987][main,pid:1,tid:1][c.w.e.c.Bootstrap:107] - main started [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2441][2021-12-09 06:04:04,161][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getServerNode [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2444][2021-12-09 06:04:04,164][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getServerNodes [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2446][2021-12-09 06:04:04,166][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getOrCreateServerNode [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2449][2021-12-09 06:04:04,169][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/createOrUpdateServerNode [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2461][2021-12-09 06:04:04,181][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getStore [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2462][2021-12-09 06:04:04,182][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getOrCreateStore [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2463][2021-12-09 06:04:04,183][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/deleteStore [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2464][2021-12-09 06:04:04,184][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/metadata/getStoreFromNamespace [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2478][2021-12-09 06:04:04,198][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/getSession [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2480][2021-12-09 06:04:04,200][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/getOrCreateSession [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2481][2021-12-09 06:04:04,201][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/stopSession [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2481][2021-12-09 06:04:04,201][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/killSession [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2482][2021-12-09 06:04:04,202][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/killAllSessions [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2483][2021-12-09 06:04:04,203][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/registerSession [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2494][2021-12-09 06:04:04,214][main,pid:1,tid:1][c.w.e.c.c.CommandRouter:107] - [COMMAND] registered v1/cluster-manager/session/heartbeat [clustermanager-fd479cb86-9fxgz clustermanager] current dir: /data/projects/fate/eggroll/. [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2542][2021-12-09 06:04:04,262][main,pid:1,tid:1][c.w.e.c.r.ClusterManagerBootstrap:107] - conf file: /data/projects/fate/eggroll/conf/eggroll.properties [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][2937][2021-12-09 06:04:04,657][main,pid:1,tid:1][c.w.e.c.t.GrpcServerUtils:107] - gRPC server at 4670 starting in insecure mode [clustermanager-fd479cb86-9fxgz clustermanager] [INFO ][3247][2021-12-09 06:04:04,967][main,pid:1,tid:1][c.w.e.c.r.ClusterManagerBootstrap:107] - server started at port 4670 [clustermanager-fd479cb86-9fxgz clustermanager] server started at port 4670 [mysql-85d85f56b9-6cs2c mysql] 2021-12-09 06:04:04+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.21-1debian10 started. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09 06:04:04+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' [mysql-85d85f56b9-6cs2c mysql] 2021-12-09 06:04:04+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.21-1debian10 started. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:05.229030Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.21) starting as process 1 [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:05.261348Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.208357Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.500870Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.656002Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.656309Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.661414Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory. [mysql-85d85f56b9-6cs2c mysql] 2021-12-09T06:04:06.729856Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.21' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server - GPL. [nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:13 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
[nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:13 +0000 [warn]: define <match fluent.> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead
[nodemanager-0-77f55fc97d-rvc2m nodemanager-0-eggrollpair] 2021-12-09 06:04:13 +0000 [info]: using configuration file:
我也遇到同样的问题,不过我的日志中主要错误是 fateboard 访问 fateflow 9380 服务出错:
[python-749dc7f58f-6xt2k fateboard] 2022-01-07 09:03:43 [34mINFO [0;39m [http-nio-8080-exec-8] (HttpClientPool.java:171) - httpclient sent url http://fateflow:9380/v1/job/stop request {"job_id":"0"} result:
[python-749dc7f58f-6xt2k fateboard] 2022-01-07 09:03:43 [1;31mERROR[0;39m [http-nio-8080-exec-8] (GlobalExceptionHandler.java:41) - error
[python-749dc7f58f-6xt2k fateboard] java.lang.NullPointerException: null
[python-749dc7f58f-6xt2k fateboard] at com.webank.ai.fate.board.controller.JobManagerController.checkAppKey(JobManagerController.java:292)
[python-749dc7f58f-6xt2k fateboard] at com.webank.ai.fate.board.controller.JobManagerController.queryJobStatus(JobManagerController.java:73)
[python-749dc7f58f-6xt2k fateboard] at sun.reflect.GeneratedMethodAccessor58.invoke(Unknown Source)
[python-749dc7f58f-6xt2k fateboard] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[python-749dc7f58f-6xt2k fateboard] at java.lang.reflect.Method.invoke(Method.java:498)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:190)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:106)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:888)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:793)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1040)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:943)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
[python-749dc7f58f-6xt2k fateboard] at javax.servlet.http.HttpServlet.service(HttpServlet.java:634)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
[python-749dc7f58f-6xt2k fateboard] at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard] at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard] at com.webank.ai.fate.board.conf.SecurityFilter.doFilter(SecurityFilter.java:39)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.session.web.http.SessionRepositoryFilter.doFilterInternal(SessionRepositoryFilter.java:141)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.session.web.http.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:82)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.doFilterInternal(WebMvcMetricsFilter.java:108)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
[python-749dc7f58f-6xt2k fateboard] at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:526)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:747)
[python-749dc7f58f-6xt2k fateboard] at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
[python-749dc7f58f-6xt2k fateboard] at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)
[python-749dc7f58f-6xt2k fateboard] at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
[python-749dc7f58f-6xt2k fateboard] at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:861)
[python-749dc7f58f-6xt2k fateboard] at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1579)
[python-749dc7f58f-6xt2k fateboard] at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
[python-749dc7f58f-6xt2k fateboard] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[python-749dc7f58f-6xt2k fateboard] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[python-749dc7f58f-6xt2k fateboard] at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
[python-749dc7f58f-6xt2k fateboard] at java.lang.Thread.run(Thread.java:748)
测试发现其他 pod 内(比如 clustermanager)可以正常访问 fateflow 9380 服务,就 fateboard 不行。
也遇到这个错误
[root demo]# kubefate cluster ls
UUID NAME NAMESPACE REVISION STATUS CHART ChartVERSION AGE
b93ac837-aaca-4040-9066-c33d6ca7f5f8 fate-9999 fate-9999 2 Running fate v1.7.1 4h7m
bc62b003-dc79-48b1-9738-e586b62c04ad fate-10000 fate-10000 1 Running fate v1.7.1 3h27m
[root demo]# kubefate cluster describe b93ac837-aaca-4040-9066-c33d6ca7f5f8 the server could not find the requested resource
我也遇到这个问题,这个问题到现在还没解决吗?以下是我的log
[mysql-c88469467-j57b7 mysql] 2022-03-02 07:05:01+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.28-1debian10 started. [mysql-c88469467-j57b7 mysql] 2022-03-02 07:05:02+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' [mysql-c88469467-j57b7 mysql] 2022-03-02 07:05:02+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.28-1debian10 started. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:02.713588Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.28) starting as process 1 [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:02.849294Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:06.559371Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:08.316051Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:08.316096Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:08.477725Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory. [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:08.719651Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock [mysql-c88469467-j57b7 mysql] 2022-03-02T07:05:08.719852Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.28' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server - GPL. [nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:03 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
[nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:03 +0000 [warn]: define <match fluent.> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead
[nodemanager-1-567b7bbdbd-mb8lv nodemanager-1-eggrollpair] 2022-03-02 07:05:03 +0000 [info]: using configuration file:
[nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:03 +0000 [warn]: define <match fluent.> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead
[nodemanager-0-6d88c65cb-64skg nodemanager-0-eggrollpair] 2022-03-02 07:05:03 +0000 [info]: using configuration file:
Check the version of cli and service through kubefate version
, you need to ensure that the two are consistent.
Check the version of cli and service through
kubefate version
, you need to ensure that the two are consistent.
确认过版本一致,问题一直存在
This bug will appear in k8s v1.22+
A temporary solution is to use k8s <=v1.21
Convert to task: to support k8s > v1.2.1
A temporary solution is to use k8s <=v1.21
2022年3月22日,使用minikube搭建集群,该问题依旧存在
root@crms-10-10-178-147[/k8s/fate]# kubefate cluster ls UUID NAME NAMESPACE REVISION STATUS CHART ChartVERSION AGE 4f14f1ca-687c-49cf-a1b7-f9a48e7b37bd fate-9999 fate-9999 1 Running fate v1.7.0 19m
root@crms-10-10-178-147[/k8s/fate]# kubefate cluster describe 4f14f1ca-687c-49cf-a1b7-f9a48e7b37bd the server could not find the requested resource root@crms-10-10-178-147[/k8s/fate]# the server could not find the requested resource
为什么没有描述信息