eBay / myriad

Apache License 2.0
17 stars 57 forks source link

Not able to run Node managers using Myriad?? #3

Open vathanlal opened 7 years ago

vathanlal commented 7 years ago

Hello,

I started Myriad successfully and it is nicely integrated with Mesos as a Framework. But it is showing Node Managers always as a pending task. When I checked the log of Mesos it is offering resource to Myriad but Myriad framework is declining the resources suddenly. I reduced the size of resources for Node Managers in myriad-config-default.yml. But still it is in the same state. I dont have much logs to look into for understanding what is causing the issue. Iam using Mesos 1.0.0 Hadoop 2.7.2 and Myriad executer 0.2.0. Is this a version compatibility issue between Mesos and Myriad?? Any help regarding this issue is really appreciated.

My yarn-root-resourcemanager-mesos.out is as below

Oct 17, 2016 4:11:57 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.myriad.api.ClustersResource as a root resource class Oct 17, 2016 4:11:57 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.myriad.api.ConfigurationResource as a root resource class Oct 17, 2016 4:11:57 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.myriad.api.SchedulerStateResource as a root resource class Oct 17, 2016 4:11:57 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.myriad.api.ControllerResource as a root resource class Oct 17, 2016 4:11:57 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.myriad.api.ArtifactsResource as a root resource class Oct 17, 2016 4:11:57 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider as a provider class Oct 17, 2016 4:11:57 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' Oct 17, 2016 4:11:58 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider to GuiceManagedComponentProvider with the scope "Singleton" Oct 17, 2016 4:11:59 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.myriad.api.ClustersResource to GuiceManagedComponentProvider with the scope "PerRequest" Oct 17, 2016 4:11:59 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.myriad.api.ConfigurationResource to GuiceManagedComponentProvider with the scope "PerRequest" Oct 17, 2016 4:11:59 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.myriad.api.SchedulerStateResource to GuiceManagedComponentProvider with the scope "PerRequest" Oct 17, 2016 4:11:59 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.myriad.api.ControllerResource to GuiceManagedComponentProvider with the scope "PerRequest" Oct 17, 2016 4:11:59 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.myriad.api.ArtifactsResource to GuiceManagedComponentProvider with the scope "PerRequest" I1017 16:11:59.989440 9720 sched.cpp:226] Version: 1.0.0 I1017 16:11:59.995721 9753 sched.cpp:330] New master detected at master@10.0.2.19:5050 I1017 16:11:59.996100 9753 sched.cpp:341] No credentials provided. Attempting to register without authentication I1017 16:11:59.998183 9748 sched.cpp:743] Framework registered with 6215a35e-749e-4f27-bb50-f7c01650da80-0006 Oct 17, 2016 4:12:01 PM com.google.inject.servlet.GuiceFilter setPipeline WARNING: Multiple Servlet injectors detected. This is a warning indicating that you have more than one GuiceFilter running in your web application. If this is deliberate, you may safely ignore this message. If this is NOT deliberate however, your application may not work as expected. Oct 17, 2016 4:12:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver as a provider class Oct 17, 2016 4:12:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices as a root resource class Oct 17, 2016 4:12:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class Oct 17, 2016 4:12:02 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' Oct 17, 2016 4:12:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton" Oct 17, 2016 4:12:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton" Oct 17, 2016 4:12:03 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices to GuiceManagedComponentProvider with the scope "Singleton"

yufeldman commented 7 years ago

Hello @vathanlal Mesos 1.0.0 should be working fine with Myriad 0.2 and at least Hadoop 2.7.0+ Could you paste part of the log here that is relevant to resources being declined, I can only see portion that is relevant to Web

vathanlal commented 7 years ago

Hai @yufeldman

Part of my mesos log is as shown below. As in the log after sending offer Mesos is getting the decline for offer from Myriad. And in my yarn-root-resourcemanager-mesos.log there is nothing related to this error would you also need that??

I1017 17:03:52.125422 1605 master.cpp:5709] Sending 2 offers to framework 6215a35e-749e-4f27-bb50-f7c01650da80-0006 (MyriadAlpha) at scheduler-f83a7daa-15ec-402e-b004-18e88a9dc3b7@10.0.2.19:51076 I1017 17:03:52.126741 1606 master.cpp:3951] Processing DECLINE call for offers: [ 6215a35e-749e-4f27-bb50-f7c01650da80-O19737 ] for framework 6215a35e-749e-4f27-bb50-f7c01650da80-0006 (MyriadAlpha) at scheduler-f83a7daa-15ec-402e-b004-18e88a9dc3b7@10.0.2.19:51076 I1017 17:03:52.127111 1603 master.cpp:3951] Processing DECLINE call for offers: [ 6215a35e-749e-4f27-bb50-f7c01650da80-O19738 ] for framework 6215a35e-749e-4f27-bb50-f7c01650da80-0006 (MyriadAlpha) at scheduler-f83a7daa-15ec-402e-b004-18e88a9dc3b7@10.0.2.19:51076 I1017 17:03:53.132866 1608 master.cpp:5709] Sending 1 offers to framework d0921eb7-2bbf-4cf8-8ffd-a4c0b0146289-0000 (chronos-2.4.0) at scheduler-2ab2d850-3c91-47d1-aa3d-dcfc7bc420fe@10.0.2.19:40076 I1017 17:03:53.134438 1602 master.cpp:3951] Processing DECLINE call for offers: [ 6215a35e-749e-4f27-bb50-f7c01650da80-O19739 ] for framework d0921eb7-2bbf-4cf8-8ffd-a4c0b0146289-0000 (chronos-2.4.0) at scheduler-2ab2d850-3c91-47d1-aa3d-dcfc7bc420fe@10.0.2.19:40076 I1017 17:03:55.144173 1604 master.cpp:5709] Sending 1 offers to framework d0921eb7-2bbf-4cf8-8ffd-a4c0b0146289-0000 (chronos-2.4.0) at scheduler-2ab2d850-3c91-47d1-aa3d-dcfc7bc420fe@10.0.2.19:40076 I1017 17:03:55.145689 1603 master.cpp:3951] Processing DECLINE call for offers: [ 6215a35e-749e-4f27-bb50-f7c01650da80-O19740 ] for framework d0921eb7-2bbf-4cf8-8ffd-a4c0b0146289-0000 (chronos-2.4.0) at scheduler-2ab2d850-3c91-47d1-aa3d-dcfc7bc420fe@10.0.2.19:40076 I1017 17:03:55.369249 1608 http.cpp:381] HTTP GET for /master/state from 10.0.2.19:55869 with User-Agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0' I1017 17:03:57.146108 1603 master.cpp:5709] Sending 2 offers to framework 6215a35e-749e-4f27-bb50-f7c01650da80-0006 (MyriadAlpha) at scheduler-f83a7daa-15ec-402e-b004-18e88a9dc3b7@10.0.2.19:51076 I1017 17:03:57.147450 1609 master.cpp:3951] Processing DECLINE call for offers: [ 6215a35e-749e-4f27-bb50-f7c01650da80-O19741 ] for framework 6215a35e-749e-4f27-bb50-f7c01650da80-0006 (MyriadAlpha) at scheduler-f83a7daa-15ec-402e-b004-18e88a9dc3b7@10.0.2.19:51076 I1017 17:03:57.147749 1608 master.cpp:3951] Processing DECLINE call for offers: [ 6215a35e-749e-4f27-bb50-f7c01650da80-O19742 ] for framework 6215a35e-749e-4f27-bb50-f7c01650da80-0006 (MyriadAlpha) at scheduler-f83a7daa-15ec-402e-b004-18e88a9dc3b7@10.0.2.19:51076 I1017 17:03:58.146752 1603 master.cpp:5709] Sending 1 offers to framework d0921eb7-2bbf-4cf8-8ffd-a4c0b0146289-0000 (chronos-2.4.0) at scheduler-2ab2d850-3c91-47d1-aa3d-dcfc7bc420fe@10.0.2.19:40076 I1017 17:03:58.148146 1609 master.cpp:3951] Processing DECLINE call for offers: [ 6215a35e-749e-4f27-bb50-f7c01650da80-O19743 ] for framework d0921eb7-2bbf-4cf8-8ffd-a4c0b0146289-0000 (chronos-2.4.0) at scheduler-2ab2d850-3c91-47d1-aa3d-dcfc7bc420fe@10.0.2.19:40076 I1017 17:04:00.152983 1605 master.cpp:5709] Sending 1 offers to framework d0921eb7-2bbf-4cf8-8ffd-a4c0b0146289-0000 (chronos-2.4.0) at scheduler-2ab2d850-3c91-47d1-aa3d-dcfc7bc420fe@10.0.2.19:40076 I1017 17:04:00.154595 1608 master.cpp:3951] Processing DECLINE call for offers: [ 6215a35e-749e-4f27-bb50-f7c01650da80-O19744 ] for framework d0921eb7-2bbf-4cf8-8ffd-a4c0b0146289-0000 (chronos-2.4.0) at scheduler-2ab2d850-3c91-47d1-aa3d-dcfc7bc420fe@10.0.2.19:40076 I1017 17:04:02.157223 1605 master.cpp:5709] Sending 2 offers to framework 6215a35e-749e-4f27-bb50-f7c01650da80-0006 (MyriadAlpha) at scheduler-f83a7daa-15ec-402e-b004-18e88a9dc3b7@10.0.2.19:51076 I1017 17:04:02.159744 1608 master.cpp:3951] Processing DECLINE call for offers: [ 6215a35e-749e-4f27-bb50-f7c01650da80-O19745 ] for framework 6215a35e-749e-4f27-bb50-f7c01650da80-0006 (MyriadAlpha) at scheduler-f83a7daa-15ec-402e-b004-18e88a9dc3b7@10.0.2.19:51076 I1017 17:04:02.160132 1604 master.cpp:3951] Processing DECLINE call for offers: [ 6215a35e-749e-4f27-bb50-f7c01650da80-O19746 ] for framework 6215a35e-749e-4f27-bb50-f7c01650da80-0006 (MyriadAlpha) at scheduler-f83a7daa-15ec-402e-b004-18e88a9dc3b7@10.0.2.19:51076

yufeldman commented 7 years ago

@vathanlal

Do you see from Mesos console that NM tries to start and fails? RM is usually very chatty about offers received. Is RM even started properly? Can you see RM UI (Not just Myriad UI)

vathanlal commented 7 years ago

@yufeldman

No NM is not showing in the Mesos Console. When I started

./yarn-daemon.sh start resourcemanager

Myriad framework is showing in the Mesos Console also jps shows resourcemanager in my command line. Iam also getting the UI in http://10.0.2.19:8088 but no nodes are showing in the cluster. My cluster info is like this in UI

`Cluster ID: 1476713515568

ResourceManager state: STARTED

ResourceManager HA state: active

ResourceManager HA zookeeper connection state: ResourceManager HA is not enabled.

ResourceManager RMStateStore: org.apache.hadoop.yarn.server.resourcemanager.recovery.NullRMStateStore

ResourceManager started on: Mon Oct 17 16:11:55 +0200 2016

ResourceManager version: 2.7.2 from b165c4fe8a74265c792ce23f546c64604acf0e41 by jenkins source checksum c63f7cc71b8f63249e35126f0f7492d on 2016-01-26T00:16Z

Hadoop version: 2.7.2 from b165c4fe8a74265c792ce23f546c64604acf0e41 by jenkins source checksum d0fda26633fa762bff87ec759ebe689c on 2016-01-26T00:08Z `

yufeldman commented 7 years ago

@vathanlal Since you are starting RM manually here, I expect it's logs to be in standard yarn logs directory - both .log and .out Can you look through them?

vathanlal commented 7 years ago

@yufeldman

Yes I have that two files in my yarn logs directory. Iam getting following exception in my yarn-root-resourcemanager-mesos.out file

`INFO: Couldn't find JAX-B element for class org.apache.myriad.api.model.FlexDownClusterRequest Oct 17, 2016 6:12:30 PM com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 resolve SEVERE: null java.lang.IllegalAccessException: Class com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can not access a member of class javax.ws.rs.core.Response with modifiers "protected" at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:102) at java.lang.Class.newInstance(Class.java:436) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467) at com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181) at com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518) at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124) at com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) at com.sun.jersey.server.impl.wadl.WadlResource.getWadl(WadlResource.java:89) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Oct 17, 2016 6:12:30 PM com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator attachTypes INFO: Couldn't find JAX-B element for class javax.ws.rs.core.Response Oct 17, 2016 6:12:30 PM com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator attachTypes INFO: Couldn't find JAX-B element for class org.apache.myriad.api.model.FlexDownServiceRequest Oct 17, 2016 6:12:30 PM com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 resolve SEVERE: null java.lang.IllegalAccessException: Class com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can not access a member of class javax.ws.rs.core.Response with modifiers "protected" at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:102) at java.lang.Class.newInstance(Class.java:436) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467) at com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181) at com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518) at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124) at com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) at com.sun.jersey.server.impl.wadl.WadlResource.getWadl(WadlResource.java:89) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)`

vathanlal commented 7 years ago

@yufeldman

In my .log file iam getting the warning "2016-10-17 18:11:23,202 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: fair-scheduler.xml not found on the classpath."

yufeldman commented 7 years ago

@vathanlal It is very strange you don't have anything in .log - not even INFO messages? Or you think those are not relevant? Could you post content of .log file?

vathanlal commented 7 years ago

@yufeldman Sorry actually I put only the WARNING here.. My log file is like as below

`STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41; compiled by 'jenkins' on 2016-01-26T00:08Z

STARTUP_MSG: java = 1.8.0_72 ****/

2016-10-17 18:11:18,549 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]

2016-10-17 18:11:19,614 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/usr/local/hadoop/etc/hadoop/core-site.xml

2016-10-17 18:11:19,889 INFO org.apache.hadoop.security.Groups: clearing userToGroupsMap cache

2016-10-17 18:11:20,236 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/usr/local/hadoop/etc/hadoop/yarn-site.xml

2016-10-17 18:11:21,447 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher

2016-10-17 18:11:22,312 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay: 900000ms

2016-10-17 18:11:22,337 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: ContainerTokenKeyRollingInterval: 86400000ms and ContainerTokenKeyActivationDelay: 900000ms

2016-10-17 18:11:22,352 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: AMRMTokenKeyRollingInterval: 86400000ms and AMRMTokenKeyActivationDelay: 900000 ms

2016-10-17 18:11:22,474 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler

2016-10-17 18:11:22,477 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager

2016-10-17 18:11:22,492 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Using Scheduler: org.apache.myriad.scheduler.yarn.MyriadFairScheduler

2016-10-17 18:11:22,562 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class org.apache.myriad.scheduler.yarn.RMNodeEventHandler

2016-10-17 18:11:22,565 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher

2016-10-17 18:11:22,566 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher

2016-10-17 18:11:22,566 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher

2016-10-17 18:11:22,567 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher

2016-10-17 18:11:22,727 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties

2016-10-17 18:11:23,021 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).

2016-10-17 18:11:23,021 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system started

2016-10-17 18:11:23,068 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.RMAppManager

2016-10-17 18:11:23,081 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType for class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher

2016-10-17 18:11:23,086 INFO org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo: Registered RMNMInfo MBean

2016-10-17 18:11:23,097 INFO org.apache.hadoop.yarn.security.YarnAuthorizationProvider: org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer is instiantiated.

2016-10-17 18:11:23,099 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list

2016-10-17 18:11:23,202 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: fair-scheduler.xml not found on the classpath.

2016-10-17 18:11:23,244 INFO org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher: YARN system metrics publishing service is not enabled

2016-10-17 18:11:23,244 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to active state

2016-10-17 18:11:23,296 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating AMRMToken

2016-10-17 18:11:23,297 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Rolling master-key for container-tokens

2016-10-17 18:11:23,297 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Rolling master-key for nm-tokens

2016-10-17 18:11:23,297 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens

2016-10-17 18:11:23,298 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 1

2016-10-17 18:11:23,299 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing RMDTMasterKey.

2016-10-17 18:11:23,318 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.nodelabels.event.NodeLabelsStoreEventType for class org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler

2016-10-17 18:11:23,301 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)

2016-10-17 18:11:23,324 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens

2016-10-17 18:11:23,324 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 2

2016-10-17 18:11:23,325 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing RMDTMasterKey.

2016-10-17 18:11:26,516 INFO org.apache.myriad.scheduler.yarn.interceptor.CompositeInterceptor: Registered org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager into the registry.

2016-10-17 18:11:26,516 INFO org.apache.myriad.scheduler.yarn.interceptor.CompositeInterceptor: Registered org.apache.myriad.scheduler.fgs.NMHeartBeatHandler into the registry.

2016-10-17 18:11:26,564 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog

2016-10-17 18:11:26,650 INFO org.mortbay.log: jetty-6.1.26

2016-10-17 18:11:29,648 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:8192

2016-10-17 18:11:29,649 INFO org.apache.myriad.Main: Initializing HealthChecks

2016-10-17 18:11:29,703 INFO org.apache.myriad.Main: Initializing Profiles

2016-10-17 18:11:29,710 INFO org.apache.myriad.scheduler.ServiceProfileManager: Adding profile zero with CPU: 0.0 and Memory: 0.0

2016-10-17 18:11:29,710 INFO org.apache.myriad.scheduler.ServiceProfileManager: Adding profile small with CPU: 1.0 and Memory: 256.0

2016-10-17 18:11:29,710 INFO org.apache.myriad.scheduler.ServiceProfileManager: Adding profile medium with CPU: 1.0 and Memory: 256.0

2016-10-17 18:11:29,710 INFO org.apache.myriad.scheduler.ServiceProfileManager: Adding profile large with CPU: 10.0 and Memory: 12288.0

2016-10-17 18:11:29,710 INFO org.apache.myriad.Main: Validating nmInstances..

2016-10-17 18:11:29,710 INFO org.apache.myriad.Main: Initializing initServiceConfigurations

2016-10-17 18:11:29,710 INFO org.apache.myriad.Main: Initializing Disruptors

2016-10-17 18:11:29,886 INFO org.apache.myriad.Main: Rebalancer is not turned on

2016-10-17 18:11:29,887 INFO org.apache.myriad.Main: Initializing Terminator

2016-10-17 18:11:29,902 INFO org.apache.myriad.Main: starting mesosDriver..

2016-10-17 18:11:29,902 INFO org.apache.myriad.scheduler.MyriadDriverManager: Starting driver...

2016-10-17 18:11:29,902 INFO org.apache.myriad.scheduler.MyriadDriver: Starting driver

2016-10-17 18:11:29,908 INFO org.apache.myriad.scheduler.MyriadDriver: Driver started with status: DRIVER_RUNNING

2016-10-17 18:11:29,909 INFO org.apache.myriad.scheduler.MyriadDriverManager: Driver started with status: DRIVER_RUNNING

2016-10-17 18:11:29,909 INFO org.apache.myriad.Main: started mesosDriver..

2016-10-17 18:11:29,909 INFO org.apache.myriad.scheduler.yarn.interceptor.CompositeInterceptor: Registered org.apache.myriad.policy.LeastAMNodesFirstPolicy into the registry.

2016-10-17 18:11:29,927 INFO org.apache.myriad.Main: Launching 1 NM(s) with profile medium

2016-10-17 18:11:29,928 INFO org.apache.myriad.scheduler.MyriadOperations: Adding 1 NM instances to cluster

2016-10-17 18:11:30,499 INFO org.apache.myriad.scheduler.event.handlers.RegisteredEventHandler: Received event: org.apache.myriad.scheduler.event.RegisteredEvent@69aba99c with frameworkId: value: "6215a35e-749e-4f27-bb50-f7c01650da80-0007"

2016-10-17 18:11:30,500 INFO org.apache.myriad.state.SchedulerState: Marked taskId nm.medium.36a17234-3818-4d8e-840e-304014eda3d2 pending, size of pending queue for nm is: 0

2016-10-17 18:11:30,501 INFO org.apache.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor: Initialized myriad.

2016-10-17 18:11:30,686 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue

2016-10-17 18:11:30,734 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8031

2016-10-17 18:11:30,776 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server

2016-10-17 18:11:30,792 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting

2016-10-17 18:11:30,794 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8031: starting

2016-10-17 18:11:30,999 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue

2016-10-17 18:11:31,018 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8030

2016-10-17 18:11:31,064 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server

2016-10-17 18:11:31,064 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting

2016-10-17 18:11:31,064 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8030: starting

2016-10-17 18:11:31,428 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue

2016-10-17 18:11:31,442 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8032

2016-10-17 18:11:31,454 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationClientProtocolPB to the server

2016-10-17 18:11:31,471 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8032: starting

2016-10-17 18:11:31,595 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state

2016-10-17 18:11:31,595 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting

2016-10-17 18:11:31,774 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.

2016-10-17 18:11:31,792 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.resourcemanager is not defined

2016-10-17 18:11:31,795 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)

2016-10-17 18:11:31,827 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context cluster

2016-10-17 18:11:31,827 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context logs

2016-10-17 18:11:31,831 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context static

2016-10-17 18:11:31,832 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context cluster

2016-10-17 18:11:31,832 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs

2016-10-17 18:11:31,832 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static

2016-10-17 18:11:31,848 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /cluster/*

2016-10-17 18:11:31,852 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*

2016-10-17 18:11:32,053 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules

2016-10-17 18:11:32,055 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8088

2016-10-17 18:11:32,055 INFO org.mortbay.log: jetty-6.1.26

2016-10-17 18:11:32,098 INFO org.mortbay.log: Extract jar:file:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.2.jar!/webapps/cluster to /tmp/Jetty_0_0_0_0_8088_cluster____u0rgz3/webapp

2016-10-17 18:11:32,699 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens

2016-10-17 18:11:32,714 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)

2016-10-17 18:11:32,715 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens

2016-10-17 18:11:34,352 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8088

2016-10-17 18:11:34,352 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app cluster started at 8088

2016-10-17 18:11:34,425 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue

2016-10-17 18:11:34,432 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8033

2016-10-17 18:11:34,433 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to the server

2016-10-17 18:11:34,435 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting

2016-10-17 18:11:34,435 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8033: starting

2016-10-17 18:21:23,205 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Release request cache is cleaned up

2016-10-17 18:40:30,231 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM

2016-10-17 18:40:30,349 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted

2016-10-17 18:40:30,359 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8088

2016-10-17 18:40:30,360 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032

2016-10-17 18:40:30,365 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032

2016-10-17 18:40:30,369 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder

2016-10-17 18:40:30,369 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033

2016-10-17 18:40:30,370 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8033

2016-10-17 18:40:30,371 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state

2016-10-17 18:40:30,371 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder

2016-10-17 18:40:30,372 WARN org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher: org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.

2016-10-17 18:40:30,372 INFO org.apache.hadoop.ipc.Server: Stopping server on 8030

2016-10-17 18:40:30,377 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8030

2016-10-17 18:40:30,381 INFO org.apache.hadoop.ipc.Server: Stopping server on 8031

2016-10-17 18:40:30,382 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder

2016-10-17 18:40:30,413 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8031

2016-10-17 18:40:30,414 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder

2016-10-17 18:40:30,414 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor thread interrupted

2016-10-17 18:40:30,414 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Returning, interrupted : java.lang.InterruptedException

2016-10-17 18:40:30,415 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Update thread interrupted. Exiting.

2016-10-17 18:40:30,415 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, igonring any new events.

2016-10-17 18:40:30,415 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted

2016-10-17 18:40:30,415 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted

2016-10-17 18:40:30,415 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted

2016-10-17 18:40:30,415 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer thread interrupted

2016-10-17 18:40:30,416 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system...

2016-10-17 18:40:30,417 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped.

2016-10-17 18:40:30,417 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete.

2016-10-17 18:40:30,417 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, igonring any new events.

2016-10-17 18:40:30,417 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state

2016-10-17 18:40:30,418 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: /`

vathanlal commented 7 years ago

@yufeldman I tried by changing configuration in myriad-config-default.yml. But still iam getting the NM in same pending state. Loooking at my yarn-root-resourcemanager-mesos.log I cant understand which one is causing the issue related to sudden declining of resources by Myriad as seen in the mesos master log. Because in the log I have only one warning related to fairscheduler as mentined above. Regarding the pending state of NM Iam getting only this info in the log.

`2016-10-17 18:11:29,927 INFO org.apache.myriad.Main: Launching 1 NM(s) with profile medium

2016-10-17 18:11:29,928 INFO org.apache.myriad.scheduler.MyriadOperations: Adding 1 NM instances to cluster

2016-10-17 18:11:30,499 INFO org.apache.myriad.scheduler.event.handlers.RegisteredEventHandler: Received event: org.apache.myriad.scheduler.event.RegisteredEvent@69aba99 with frameworkId: value: "6215a35e-749e-4f27-bb50-f7c01650da80-0007"

2016-10-17 18:11:30,500 INFO org.apache.myriad.state.SchedulerState: Marked taskId nm.medium.36a17234-3818-4d8e-840e-304014eda3d2 pending, size of pending queue for nm is: 0

2016-10-17 18:11:30,501 INFO org.apache.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor: Initialized myriad.

2016-10-17 18:11:30,686 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue`

My myriad-config-default.yml is as shown below

`

mesosMaster: 10.0.2.15:5050

mesosMaster: 10.0.2.19:5050

checkpoint: false

frameworkFailoverTimeout: 43200000

frameworkFailoverTimeout: 0

frameworkName: MyriadAlpha frameworkRole: "*" frameworkUser: root # User the Node Manager runs as, required if nodeManagerURI set, otherwise defaults to the user

running the resource manager.

frameworkSuperUser: root # To be deprecated, currently permissions need set by a superuser due to Mesos-1790. Must be

root or have passwordless sudo. Required if nodeManagerURI set, ignored otherwise.

nativeLibrary: /usr/local/lib/libmesos.so zkServers: 10.0.2.19:2181 zkTimeout: 20000 restApiPort: 8192

servedConfigPath: dist/config.tgz

servedBinaryPath: dist/hadoop-2.6.0.tgz

profiles: zero: # NMs launched with this profile dynamically obtain cpu/mem from Mesos cpu: 0 mem: 0 small: cpu: 1 mem: 256 medium: cpu: 1 mem: 256 large: cpu: 10 mem: 12288 nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero profile. medium: 1 # rebalancer: false haEnabled: false nodemanager: jvmMaxMemoryMB: 1024 cpus: 0.2 cgroups: false executor: jvmMaxMemoryMB: 256

path: file:///usr/local/libexec/mesos/myriad-executor-runnable-0.1.0.jar

path: file:///usr/local/hadoop/share/hadoop/yarn/lib/myriad-executor-0.2.0.jar

The following should be used for a remotely distributed URI, hdfs assumed but other URI types valid.

nodeManagerUri: hdfs://namenode:port/dist/hadoop-2.7.0.tar.gz

configUri: http://127.0.0.1/api/arifacts/config.tgz

jvmUri: https://downloads.mycompany.com/java/jre-7u76-linux-x64.tar.gz

yarnEnvironment: YARN_HOME: /usr/local/hadoop

HADOOP_CONF_DIR=config

HADOOP_TMP_DIR=$MESOS_SANDBOX

YARN_HOME: hadoop-2.7.0 #this should be relative if nodeManagerUri is set

JAVA_HOME: /usr/lib/jvm/java-default #System dependent, but sometimes necessary

JAVA_HOME: jre1.7.0_76 # Path to JRE distribution, relative to sandbox directory

JAVA_LIBRARY_PATH: /opt/mycompany/lib

mesosAuthenticationPrincipal:

mesosAuthenticationSecretFilename:`

yufeldman commented 7 years ago

Could you enable DEBUG at least for org.apache.myriad.scheduler package? Either offers don't go through or something else. You can add it to log4j.properties in etc/hadoop/

vathanlal commented 7 years ago

This what iam getting after enabling DEBUG for org.apache.myriad.scheduler

`2016-10-19 10:45:45,521 INFO org.apache.myriad.api.ClustersResource: Received flexup request. Profile: zero, Instances: 1, Constraints: null

2016-10-19 10:45:45,525 INFO org.apache.myriad.scheduler.MyriadOperations: Adding 1 NM instances to cluster

2016-10-19 10:45:45,525 INFO org.apache.myriad.state.SchedulerState: Marked taskId nm.zero.063a54db-f00e-47dc-8551-159095e29872 pending, size of pending queue for nm is: 0

2016-10-19 10:45:49,642 DEBUG org.apache.myriad.scheduler.event.handlers.ResourceOffersEventHandler: Received offers 2

2016-10-19 10:45:49,642 DEBUG org.apache.myriad.scheduler.event.handlers.ResourceOffersEventHandler: Pending tasks: [value: "nm.zero.063a54db-f00e-47dc-8551-159095e29872" ]

2016-10-19 10:45:49,643 DEBUG org.apache.myriad.scheduler.SchedulerUtils: Offer's hostname hadoop1 is unique

2016-10-19 10:45:49,643 DEBUG org.apache.myriad.scheduler.SchedulerUtils: Offer's hostname mesos is unique

2016-10-19 10:45:49,643 DEBUG org.apache.myriad.scheduler.event.handlers.ResourceOffersEventHandler: Declining offer id { value: "ecdb076c-1cd6-4560-8a0c-1ec04a04ffef-O3235" } framework_id { value: "ecdb076c-1cd6-4560-8a0c-1ec04a04ffef-0002" } slaveid { value: "ecdb076c-1cd6-4560-8a0c-1ec04a04ffef-S0" } hostname: "hadoop1" resources { name: "cpus" type: SCALAR scalar { value: 1.0 } role: "" } resources { name: "mem" type: SCALAR scalar { value: 1000.0 } role: "" } resources { name: "disk" type: SCALAR scalar { value: 9091.0 } role: "" } resources { name: "ports" type: RANGES ranges { range { begin: 31000 end: 32000 } } role: "_" } url { scheme: "http" address { hostname: "hadoop1" ip: "10.0.2.24" port: 5051 } path: "/slave(1)" } from slave hadoop1. 2016-10-19 10:45:49,645 DEBUG org.apache.myriad.scheduler.event.handlers.ResourceOffersEventHandler: Declining offer id { value: "ecdb076c-1cd6-4560-8a0c-1ec04a04ffef-O3236" } framework_id { value: "ecdb076c-1cd6-4560-8a0c-1ec04a04ffef-0002" } slaveid { value: "ecdb076c-1cd6-4560-8a0c-1ec04a04ffef-S1" } hostname: "mesos" resources { name: "ports" type: RANGES ranges { range { begin: 31000 end: 31122 } range { begin: 31124 end: 32000 } } role: "" } resources { name: "mem" type: SCALAR scalar { value: 488.0 } role: "_" } resources { name: "disk" type: SCALAR scalar { value: 8491.0 } role: "*" } url { scheme: "http" address { hostname: "mesos" ip: "10.0.2.19" port: 5051 } path: "/slave(1)" } from slave mesos. `

yufeldman commented 7 years ago

I feel you are not on 0.2 Myriad, but master. In any case I feel you may need to do remote debugging to see my offers are declined. I am testing on master now as well and I have hit few issues

vathanlal commented 7 years ago

Iam also getting this in the DEBUG.

2016-10-19 17:26:22,622 DEBUG org.apache.myriad.state.SchedulerState: Could not update state to state store as HA is disabled

I dont know whether this is creating the problem. I dont have anything in the yarn logs and also in the Mesos logs instead of Decline resources.

My two nodes are registered in mesos with each has 1 core and 1000MB RAM. I dont know whether a minimum of 1024MB is required for Myriad. In the myriad-config-default.yml I set the profile small and medium according to that but still the NM is in pending state.

yufeldman commented 7 years ago

You probably don't have enough resources. You can "cheat" on resources and set them manually for Mesos (--resources param for Mesos agent: e.g. --resources=cpus:12;mem:15000). At least it may give you a chance to overcome the issue of NMs not being able to spin up. Just be ware, that Mesos agent would not restart easily after a change of the resources.

vathanlal commented 7 years ago

@yufeldman I increased the resource in Mesos and still iam in the same state NM is not starting. But this time I got new error as below

ERROR org.apache.myriad.scheduler.event.handlers.ResourceOffersEventHandler: Exception thrown while trying to create a task for nm java.lang.IllegalArgumentException: bound must be positive