Jobs get completed successfully on most of the occasions. But recently mist server failed jobs with error executor was terminated. After it for a certain duration mist server was returning 500 error for other jobs.
Logs recorded are provided below
2019-03-27 02:24:36 WARN ReliableDeliverySupervisor:131 - Association with remote system [akka.tcp://mist-info-provider@127.0.0.1:38177] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://mist-info-provider@127.0.0.1:38177]] Caused by: [Connection refused: /127.0.0.1:38177]
2019-03-27 02:24:37 WARN RemoteWatcher:131 - Detected unreachable: [akka.tcp://mist-worker-Big-Query-3-v1_66b0b2a4-624b-4d52-b947-de1445870c80-pool-1@x.x.x.x:46424]
2019-03-27 02:24:37 WARN RemoteWatcher:131 - Detected unreachable: [akka.tcp://mist-worker-Big-Query-1-v1_46079ac9-5e6b-44c0-a736-4eca7735d41d-pool-1@y.y.y.y:40868]
2019-03-27 02:24:37 WARN RemoteWatcher:131 - Detected unreachable: [akka.tcp://mist-info-provider@127.0.0.1:38177]
2019-03-27 02:24:37 INFO JobActor:107 - Job fa70b0a9-617b-4de1-b71a-8dcef2f25f55 completed with error
2019-03-27 02:24:37 INFO JobActor:107 - Job 6cd0a160-05ae-4ad5-bc4b-6abb0bac063d completed with error
2019-03-27 02:24:37 INFO SharedConnector:107 - Releasing connection: requested 0, pooled 0, in use 1, starting: 0
2019-03-27 02:24:37 INFO SharedConnector:107 - Releasing connection: requested 0, pooled 0, in use 0, starting: 0
2019-03-27 02:24:37 INFO SharedConnector:107 - Released unused connection
2019-03-27 02:24:37 INFO ContextFrontend:107 - Context Context-1 - move to inactive state
2019-03-27 02:24:37 INFO ContextFrontend:107 - Context Context-3 - move to inactive state
2019-03-27 02:24:37 ERROR RestartSupervisor:143 - Reference for FunctionInfoProvider was terminated. Restarting
Also in mist logs I am getting this error continuously.
2019-03-27 04:00:01 INFO ContextFrontend:107 - Context-1 - connected state(active connections: 0, max: 1)
2019-03-27 04:00:09 ERROR SharedConnector:159 - Could not start worker connection
java.lang.RuntimeException: Process terminated with error java.lang.RuntimeException: Process exited with status code 1 and out: Ivy Default Cache set to: /home/cassandra/.ivy2/cache;The jars for the packages stored in: /home/cassandra/.ivy2/jars;:: loading settings :: url = jar:file:/cassandra/spark2.2.1/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml;org.apache.hadoop#hadoop-aws added as a dependency;org.apache.hadoop#hadoop-client added as a dependency;com.typesafe#config added as a dependency;:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0; confs: [default]; found org.apache.hadoop#hadoop-aws;2.7.4 in spark-list; found org.apache.hadoop#hadoop-common;2.7.4 in spark-list; found org.apache.hadoop#hadoop-annotations;2.7.4 in spark-list; found com.google.guava#guava;11.0.2 in spark-list; found com.google.code.findbugs#jsr305;3.0.0 in spark-list; found commons-cli#commons-cli;1.2 in spark-list; found org.apache.commons#commons-math3;3.1.1 in spark-list; found xmlenc#xmlenc;0.52 in spark-list; found commons-httpclient#commons-httpclient;3.1 in spark-list; found commons-logging#commons-logging;1.1.3 in spark-list; found commons-codec#commons-codec;1.4 in spark-list; found commons-io#commons-io;2.4 in spark-list; found commons-net#commons-net;3.1 in spark-list; found commons-collections#commons-collections;3.2.2 in spark-list; found javax.servlet#servlet-api;2.5 in spark-list; found org.mortbay.jetty#jetty;6.1.26 in spark-list; found org.mortbay.jetty#jetty-util;6.1.26 in spark-list
at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:138)
at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-03-27 04:00:09 ERROR ContextFrontend:159 - Ask new worker connection for Context-2 failed
java.lang.RuntimeException: Process terminated with error java.lang.RuntimeException: Process exited with status code 1 and out: Ivy Default Cache set to: /home/cassandra/.ivy2/cache;The jars for the packages stored in: /home/cassandra/.ivy2/jars;:: loading settings :: url = jar:file:/cassandra/spark2.2.1/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml;org.apache.hadoop#hadoop-aws added as a dependency;org.apache.hadoop#hadoop-client added as a dependency;com.typesafe#config added as a dependency;:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0; confs: [default]; found org.apache.hadoop#hadoop-aws;2.7.4 in spark-list; found org.apache.hadoop#hadoop-common;2.7.4 in spark-list; found org.apache.hadoop#hadoop-annotations;2.7.4 in spark-list; found com.google.guava#guava;11.0.2 in spark-list; found com.google.code.findbugs#jsr305;3.0.0 in spark-list; found commons-cli#commons-cli;1.2 in spark-list; found org.apache.commons#commons-math3;3.1.1 in spark-list; found xmlenc#xmlenc;0.52 in spark-list; found commons-httpclient#commons-httpclient;3.1 in spark-list; found commons-logging#commons-logging;1.1.3 in spark-list; found commons-codec#commons-codec;1.4 in spark-list; found commons-io#commons-io;2.4 in spark-list; found commons-net#commons-net;3.1 in spark-list; found commons-collections#commons-collections;3.2.2 in spark-list; found javax.servlet#servlet-api;2.5 in spark-list; found org.mortbay.jetty#jetty;6.1.26 in spark-list; found org.mortbay.jetty#jetty-util;6.1.26 in spark-list
at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:138)
at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-03-27 04:00:09 INFO ContextFrontend:107 - Context-2 - connected state(active connections: 0, max: 1)
2019-03-27 04:00:09 INFO SharedConnector:107 - Pool is empty and we are able to start new one connection: inUse size :0
What is the possible cause is it related with some configuration issue? If it is then why is it not happening for all jobs?
Probably, there are some errors in context configuration.
Could you check additional process logs in logs directory? There should be log files with name like local-worker-$context-name.
Jobs get completed successfully on most of the occasions. But recently mist server failed jobs with error executor was terminated. After it for a certain duration mist server was returning 500 error for other jobs. Logs recorded are provided below
Also in mist logs I am getting this error continuously.
What is the possible cause is it related with some configuration issue? If it is then why is it not happening for all jobs?