Hydrospheredata / mist

Serverless proxy for Spark cluster
http://hydrosphere.io/mist/
Apache License 2.0
326 stars 68 forks source link

Mist Spark cluster mode issue in DC/OS via Marathon #238

Closed sreeraaman closed 7 years ago

sreeraaman commented 7 years ago

Dear All,

We managed to deploy the mist docker image in DC/OS via marathon using the following json configuration.

{ "volumes": null, "id": "/mist-job-server", "cmd": "/usr/share/mist/bin/mist-master start --config /config/docker.conf --router-config /config/router.conf --debug true", "args": null, "user": null, "env": null, "instances": 1, "cpus": 1, "mem": 2048, "disk": 500, "gpus": 0, "executor": null, "constraints": null, "fetch": null, "storeUrls": null, "backoffSeconds": 1, "backoffFactor": 1.15, "maxLaunchDelaySeconds": 3600, "container": { "docker": { "image": "hydrosphere/mist:0.12.3-2.1.1", "forcePullImage": true, "privileged": false, "portMappings": [ { "containerPort": 2004, "protocol": "tcp", "servicePort": 10106 } ], "network": "BRIDGE" }, "type": "DOCKER", "volumes": [ { "containerPath": "/config", "hostPath": "/nfs/mist/config", "mode": "RW" }, { "containerPath": "/jobs", "hostPath": "/nfs/mist/jobs", "mode": "RW" }, { "containerPath": "/var/run/docker.sock", "hostPath": "/var/run/docker.sock", "mode": "RW" } ] }, "healthChecks": null, "readinessChecks": null, "dependencies": null, "upgradeStrategy": { "minimumHealthCapacity": 1, "maximumOverCapacity": 1 }, "labels": { "HAPROXY_GROUP": "external" }, "acceptedResourceRoles": null, "residency": null, "secrets": null, "taskKillGracePeriodSeconds": null, "portDefinitions": [ { "port": 10106, "protocol": "tcp", "labels": {} } ], "requirePorts": false }

Now, we wanted to switch spark from local mode to cluster mode.

Our docker.conf file looks as follows:

mist { context-defaults.spark-conf = { spark.master = "local[4]" spark.jars.packages = "com.datastax.spark:spark-cassandra-connector_2.11:2.0.3" spark.cassandra.connection.host="node-0.cassandra.mesos" }

context.test.spark-conf = { spark.cassandra.connection.host="node-0.cassandra.mesos" spark.jars.packages = "com.datastax.spark:spark-cassandra-connector_2.11:2.0.3" }

http { on = true host = "0.0.0.0" port = 2004 }

workers.runner = "local" }

To make spark run in cluster mode, we added the following:

mist { context-defaults.spark-conf = { spark.master = "mesos://spark.marathon.mesos:31921" spark.submit.deployMode = "cluster" spark.mesos.executor.docker.image = "mesosphere/spark:1.1.0-2.1.1-hadoop-2.6" spark.mesos.executor.home = "/opt/spark/dist" spark.jars.packages = "com.datastax.spark:spark-cassandra-connector_2.11:2.0.3" spark.cassandra.connection.host="node-0.cassandra.mesos" }

context.test.spark-conf = { spark.cassandra.connection.host="node-0.cassandra.mesos" spark.jars.packages = "com.datastax.spark:spark-cassandra-connector_2.11:2.0.3" }

http { on = true host = "0.0.0.0" port = 2004 }

workers.runner = "local" //???? }

Now we get the exception mesos native library libmesos.so not found.

Does anybody know what we are missing?

Also, can anybody tell us what are the valid values for workers.runner? Do we have to change anything here?

best regards Sriraman.

dos65 commented 7 years ago

It doesn't work because our image doesn't contain libmesos.so. I think that if you build image with that library or mount it to existing image, it should fix that problem.

About runners, I don't think that you need to change defaults, but here is short overview:

There are 3 ways how mist can run worker:

sreeraaman commented 7 years ago

Hi,

Thanks for the update. I will take this up a little later. My immediate concern is to have certain contexts / namespaces precreated. The docs describes setting this: mist.contextSetting.onstart=[ns1,ns2,ns3...]

However, the Master.scala class looks for a different property called "precreated".

config.contextsSettings.precreated.foreach(context => { val name = context.name logger.info(s"Precreate context for $name namespace") workerManager ! CreateContext(name) })

The current MistConfig.scala does not even pickup custom namespace specific properties. Would like to know if there is a specific way in which custom namespaces have to be configured in the router.conf file.

Currently, the router.conf looks as follows:

mist { context { test { spark-conf = { spark.master = "local[2]" spark.cassandra.connection.host = "docker.for.mac.localhost" spark.jars.packages = "com.datastax.spark:spark-cassandra-connector_2.11:2.0.3" } precreated = "true" } } }

Thanks in advance.

best regards Sriraman.

sreeraaman commented 7 years ago

Hi,

There was a typo. read router.conf as docker.conf.

Apologies for the inconvenience.

best regards Sriraman.

dos65 commented 7 years ago

Hi, thanks for pointing on documentation issue. Correct property is precreated. Unluckily, I broke custom contexts parsing at latest release. Now it fixed.

sreeraaman commented 7 years ago

Hi,

Thanks for the update. Just tried building the docker image with sbt -DsparkVersion=2.1.1 mist/docker after pulling the latest from the master branch and got the following exception:

java.lang.NullPointerException at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114) at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114) at scala.collection.IndexedSeqOptimized$class.isEmpty(IndexedSeqOptimized.scala:27) at scala.collection.mutable.ArrayOps$ofRef.isEmpty(ArrayOps.scala:108) at $c4fd88d51a2a9a6b7ab1$$anonfun$mistMiscTasks$2.apply(mist.sbt:229) at $c4fd88d51a2a9a6b7ab1$$anonfun$mistMiscTasks$2.apply(mist.sbt:227) at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) at sbt.std.Transform$$anon$4.work(System.scala:63) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226) at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) at sbt.Execute.work(Execute.scala:235) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226) at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159) at sbt.CompletionService$$anon$2.call(CompletionService.scala:28) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) [error] (mist/*:buildUi) java.lang.NullPointerException [error] Total time: 49 s, completed 18 Jul, 2017 5:15:38 PM

dos65 commented 7 years ago

Do you have node and npm installed locally? If not, install them, please. They are required for build ui.

sreeraaman commented 7 years ago

hi,

Is this a recent change ?. My previous docker image builds used to happen without any issues.

sreeraaman commented 7 years ago

Hi, It is already installed as shown below: Last login: Tue Jul 18 11:54:52 on ttys006 Sriramans-MacBook-Pro:~ sriram$ node -v v6.10.3 Sriramans-MacBook-Pro:~ sriram$ npm -v 3.10.10 Sriramans-MacBook-Pro:~ sriram$

dos65 commented 7 years ago

Is this a recent change ?. My previous docker image builds used to happen without any issues.

Yes, we are working on new ui. Could you provide all logs from sbt mist/docker ?

sreeraaman commented 7 years ago

Attaching the output from sbt mist/docker. Looks like there is an issue while cloning the ui module from git.

Please make sure you have the correct access rights and the repository exists. fatal: clone of 'git@github.com:Hydrospheredata/mist-ui.git' into submodule path '/Users/sriram/mist/ui' failed Failed to clone 'ui' a second time, aborting java.io.IOException: Cannot run program "npm" (in directory "/Users/sriram/mist/ui"): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at sbt.SimpleProcessBuilder.run(ProcessImpl.scala:349) at sbt.AbstractProcessBuilder.run(ProcessImpl.scala:126) at sbt.AbstractProcessBuilder.$bang(ProcessImpl.scala:154) at $c4fd88d51a2a9a6b7ab1$$anonfun$mistMiscTasks$2.apply(mist.sbt:233) at $c4fd88d51a2a9a6b7ab1$$anonfun$mistMiscTasks$2.apply(mist.sbt:227) at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) at sbt.std.Transform$$anon$4.work(System.scala:63) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226) at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) at sbt.Execute.work(Execute.scala:235) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226) at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159) at sbt.CompletionService$$anon$2.call(CompletionService.scala:28) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) at sbt.SimpleProcessBuilder.run(ProcessImpl.scala:349) at sbt.AbstractProcessBuilder.run(ProcessImpl.scala:126) at sbt.AbstractProcessBuilder.$bang(ProcessImpl.scala:154) at $c4fd88d51a2a9a6b7ab1$$anonfun$mistMiscTasks$2.apply(mist.sbt:233) at $c4fd88d51a2a9a6b7ab1$$anonfun$mistMiscTasks$2.apply(mist.sbt:227) at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) at sbt.std.Transform$$anon$4.work(System.scala:63) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226) at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) at sbt.Execute.work(Execute.scala:235) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226) at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159) at sbt.CompletionService$$anon$2.call(CompletionService.scala:28) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) [error] (mist/*:buildUi) java.io.IOException: Cannot run program "npm" (in directory "/Users/sriram/mist/ui"): error=2, No such file or directory [error] Total time: 17 s, completed 18 Jul, 2017 7:01:34 PM

dos65 commented 7 years ago

Can you pull from master again and try build image? There was a minor fix around git submodules. And I'm not sure, but I think before running sbt command you need to update submodule settings using: git submodule sync

sreeraaman commented 7 years ago

Hi,

Thanks it worked. Will ping back if I am stuck elsewhere.

best regards Sriraman.