Closed luck02 closed 7 years ago
Is it possible it's a race condition where the pod gets created before the configmap and secret?
The pod is created @ "creationTimestamp": "2017-09-23T05:02:36Z" The config map is created @ "creationTimestamp": "2017-09-23T05:02:37Z" The secret is created @ "creationTimestamp": "2017-09-23T05:02:36Z"
I'm not sure what the initialization looks like for this, IE will the pod immediately error out? Will it try again etc.
I'm just digging around in the spark submit code, trying to figure out what the interplay between the staging server and the pod scheduling is. Looks like everyone is busy trying to merge upstream (Which is awesome!)
hrm. I tweaked my setup a bit.
I baked my jars into the images (which makes more sense anyways). Submission is now very fast, which is great but still the same error.
pod can't mount configmap or secret.
./infra/spark-2.2.0-k8s-0.4.0-SNAPSHOT-bin-2.7.3/bin/spark-submit \
--verbose \
--deploy-mode cluster \
--master k8s://http://127.0.0.1:8001 \
--kubernetes-namespace default \
--jars local:/opt/jars/hadoop-aws-2.7.3.jar,local:/opt/jars/aws-java-sdk-1.7.4.jar,local:/opt/jars/joda-time-2.9.3.jar,local:/opt/jars/mysql-connector-java-5.1.35.jar\
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-k8s-service-account \
--conf spark.kubernetes.docker.image.pullPolicy=Always \
--conf spark.driver.memory=14G \
--conf spark.driver.cores=8 \
--conf spark.executor.instances=10 \
--conf spark.executor.cores=6\
--conf spark.executor.memory=20G \
--conf spark.app.name=kmderived \
--conf spark.kubernetes.driver.docker.image=$SPARK_DRIVER \
--conf spark.kubernetes.executor.docker.image=$SPARK_EXECUTOR \
--conf spark.kubernetes.initcontainer.docker.image=$SPARK_INIT \
--conf spark.kubernetes.driver.labels=role=kissmetrics \
--conf spark.kubernetes.executor.labels=role=kissmetrics \
--conf spark.kubernetes.resourceStagingServer.uri=http://someaws-identifier.aws-region-1.elb.amazonaws.com:10000 \
--py-files ./jobs.zip \
./driver.py
And results are:
MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "kmderived-1506310930218-init-config" not found
Manually confirmed that config map does exist. My working theory is some sort of race condition.
In addition the secrets and configmaps are not cleaned up. So their left lying around after a failure.
I got to the point where I am almost starting to actually debug the behavious but it's slow going as I'm just not familiar enough with the spark codebase, scala development in general and java development (background is primarily Golang / .NET / JS with a sprinkling of scala).
Hey @luck02 ! Not sure if I can help too much, but in my experience, this is oftentimes the by-product of a different issue than reported. Due to the lack of consistent logging and debugging tools, a smaller error might cause this ConfigMap failure to occur, while this error might not be the reason your job submission failed at all.
Regardless, can you confirm that you are prepending local://
, instead of just local
, in front of the pre-baked JARs that you provide to the --jars
param? I.e. --jars local://{path to jar 1}, ...
@sahilprasad it was local:/
I changed it to local://
and got the same general issue.
I've also now tried without the staging server by cutting out all files except what i needed for this one job in the zip. IE got it down to less than 10kb.
Similar problem but totally taking the staging service out of the picture:
spark submit:
./infra/spark-2.2.0-k8s-0.4.0-SNAPSHOT-bin-2.7.3/bin/spark-submit \
--verbose \
--deploy-mode cluster \
--master k8s://http://127.0.0.1:8001 \
--kubernetes-namespace default \
--jars local://opt/jars/hadoop-aws-2.7.3.jar,local://opt/jars/aws-java-sdk-1.7.4.jar,local://opt/jars/joda-time-2.9.3.jar,local://opt/jars/mysql-connector-java-5.1.35.jar\
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-k8s-service-account \
--conf spark.kubernetes.docker.image.pullPolicy=Always \
--conf spark.driver.memory=14G \
--conf spark.driver.cores=8 \
--conf spark.executor.instances=10 \
--conf spark.executor.cores=6\
--conf spark.executor.memory=20G \
--conf spark.app.name=kmderived \
--conf spark.kubernetes.driver.docker.image=$SPARK_DRIVER \
--conf spark.kubernetes.executor.docker.image=$SPARK_EXECUTOR \
--conf spark.kubernetes.initcontainer.docker.image=$SPARK_INIT \
--conf spark.kubernetes.driver.labels=role=kissmetrics \
--conf spark.kubernetes.executor.labels=role=kissmetrics \
--py-files file://$PATH_TO_JOB \
file://$PATH_TO_DRIVER
pod logs:
++ id -u
+ myuid=0
++ id -g
+ mygid=0
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ /sbin/tini -s -- /bin/sh -c 'SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && env | grep SPARK_JAVA_OPT_ | sed '\''s/[^=]*=\(.*\)/\1/g'\'' > /tmp/java_opts.txt && readarray -t SPARK_DRIVER_JAVA_OPTS < /tmp/java_opts.txt && if ! [ -z ${SPARK_MOUNTED_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && if ! [ -z ${SPARK_SUBMIT_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_SUBMIT_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && if ! [ -z ${SPARK_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && if ! [ -z ${SPARK_MOUNTED_FILES_DIR+x} ]; then cp -R "$SPARK_MOUNTED_FILES_DIR/." .; fi && if ! [ -z ${SPARK_MOUNTED_FILES_FROM_SECRET_DIR+x} ]; then cp -R "$SPARK_MOUNTED_FILES_FROM_SECRET_DIR/." .; fi && ${JAVA_HOME}/bin/java "${SPARK_DRIVER_JAVA_OPTS[@]}" -cp $SPARK_CLASSPATH -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY $SPARK_DRIVER_CLASS $PYSPARK_PRIMARY $PYSPARK_FILES $SPARK_DRIVER_ARGS'
python: can't open file '/var/spark-data/spark-files/driver.py': [Errno 2] No such file or directory
Exception in thread "main" org.apache.spark.SparkUserAppException: User application exited with 2
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
and this is in the events:
MountVolume.SetUp failed for volume "submitted-files" : secrets "spark-9ed113390ce643e793baf9f9e83514cb-submitted-files" not found
That secret does exist, not sure where to go from here.
It's obvious the issue isn't in the staging server. I also just confirmed that the secret in question is correct. I see my driver.py source code and a blob of zipped data in the other one.
Can you try to submit again, with one more /
in each of your JARs? It should be local://
(the adapter) with the actual path appended to it. For example, local://
+ /opt/spark/jars/something.jar
. Just want to rule this out as a possible cause for the error.
Absolutely, thanks for helping @sahilprasad
I added the additional /
as suggested, no change. I flat out removed the --jars line from my spark-submit script... no change.
Here's my hole spark-submit output, not sure if it'll help.
garylucas@gary-lucas:~/dev/ub-data-derpmetrics|DAT-spark-on-k8s⚡
⇒ ./run_page_spark.bash
/Users/garylucas/dev/ub-data-derpmetrics/jobs.zip
/Users/garylucas/dev/ub-data-derpmetrics/driver.py
docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-derpmetrics:latest
docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-derpmetrics:latest
Using properties file: null
Parsed arguments:
master k8s://http://127.0.0.1:8001
deployMode cluster
executorMemory 20G
executorCores 6
totalExecutorCores null
propertiesFile null
driverMemory 14G
driverCores 8
driverExtraClassPath null
driverExtraLibraryPath null
driverExtraJavaOptions null
supervise false
queue null
numExecutors 10
files null
pyFiles file:///Users/garylucas/dev/ub-data-derpmetrics/jobs.zip
archives null
mainClass null
primaryResource file:///Users/garylucas/dev/ub-data-derpmetrics/driver.py
name kmderived
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file null:
(spark.kubernetes.executor.docker.image,docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-derpmetrics:latest)
(spark.kubernetes.docker.image.pullPolicy,Always)
(spark.executor.memory,20G)
(spark.driver.memory,14G)
(spark.executor.instances,10)
(spark.kubernetes.authenticate.driver.serviceAccountName,spark-k8s-service-account)
(spark.driver.cores,8)
(spark.app.name,kmderived)
(spark.kubernetes.driver.labels,role=derpmetrics)
(spark.kubernetes.driver.docker.image,docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-derpmetrics:latest)
(spark.kubernetes.initcontainer.docker.image,docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-init-derpmetrics:latest)
(spark.kubernetes.executor.labels,role=derpmetrics)
(spark.executor.cores,6)
Main class:
org.apache.spark.deploy.kubernetes.submit.Client
Arguments:
--primary-py-file
file:///Users/garylucas/dev/ub-data-derpmetrics/driver.py
--main-class
org.apache.spark.deploy.PythonRunner
--other-py-files
file:///Users/garylucas/dev/ub-data-derpmetrics/jobs.zip
System properties:
(spark.kubernetes.executor.docker.image,docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-derpmetrics:latest)
(spark.kubernetes.docker.image.pullPolicy,Always)
(spark.executor.memory,20G)
(spark.driver.memory,14G)
(spark.executor.instances,10)
(SPARK_SUBMIT,true)
(spark.kubernetes.namespace,default)
(spark.kubernetes.authenticate.driver.serviceAccountName,spark-k8s-service-account)
(spark.driver.cores,8)
(spark.app.name,kmderived)
(spark.kubernetes.driver.docker.image,docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-derpmetrics:latest)
(spark.kubernetes.driver.labels,role=derpmetrics)
(spark.kubernetes.initcontainer.docker.image,docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-init-derpmetrics:latest)
(spark.submit.deployMode,cluster)
(spark.master,k8s://http://127.0.0.1:8001)
(spark.kubernetes.executor.labels,role=derpmetrics)
(spark.executor.cores,6)
Classpath elements:
2017-09-24 21:49:36 WARN ConfigurationUtils:66 - Configuration with key spark.kubernetes.driver.labels is deprecated. Use configurations with prefix spark.kubernetes.driver.label.<key> instead.
2017-09-24 21:49:38 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: kmderived-1506314976209-driver
namespace: default
labels: role -> derpmetrics, spark-app-selector -> spark-5f563affbb7547fa9562c78bfd66c763, spark-role -> driver
pod uid: ef5e9e46-a1ac-11e7-b97c-026e1e112030
creation time: 2017-09-25T04:49:38Z
service account name: spark-k8s-service-account
volumes: submitted-files, spark-k8s-service-account-token-fn4qg
node name: N/A
start time: N/A
container images: N/A
phase: Pending
status: []
2017-09-24 21:49:38 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: kmderived-1506314976209-driver
namespace: default
labels: role -> derpmetrics, spark-app-selector -> spark-5f563affbb7547fa9562c78bfd66c763, spark-role -> driver
pod uid: ef5e9e46-a1ac-11e7-b97c-026e1e112030
creation time: 2017-09-25T04:49:38Z
service account name: spark-k8s-service-account
volumes: submitted-files, spark-k8s-service-account-token-fn4qg
node name: ip-10-253-50-77.ec2.internal
start time: N/A
container images: N/A
phase: Pending
status: []
2017-09-24 21:49:38 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: kmderived-1506314976209-driver
namespace: default
labels: role -> derpmetrics, spark-app-selector -> spark-5f563affbb7547fa9562c78bfd66c763, spark-role -> driver
pod uid: ef5e9e46-a1ac-11e7-b97c-026e1e112030
creation time: 2017-09-25T04:49:38Z
service account name: spark-k8s-service-account
volumes: submitted-files, spark-k8s-service-account-token-fn4qg
node name: ip-10-253-50-77.ec2.internal
start time: 2017-09-25T04:49:38Z
container images: docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-derpmetrics:latest
phase: Pending
status: [ContainerStatus(containerID=null, image=docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-derpmetrics:latest, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2017-09-24 21:49:38 INFO Client:54 - Waiting for application kmderived to finish...
2017-09-24 21:49:40 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: kmderived-1506314976209-driver
namespace: default
labels: role -> derpmetrics, spark-app-selector -> spark-5f563affbb7547fa9562c78bfd66c763, spark-role -> driver
pod uid: ef5e9e46-a1ac-11e7-b97c-026e1e112030
creation time: 2017-09-25T04:49:38Z
service account name: spark-k8s-service-account
volumes: submitted-files, spark-k8s-service-account-token-fn4qg
node name: ip-10-253-50-77.ec2.internal
start time: 2017-09-25T04:49:38Z
container images: docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-derpmetrics:latest
phase: Failed
status: [ContainerStatus(containerID=docker://075a6b4aff0db13aa24a5acc30392227192aa7c5fd8a050156e490809fdfb930, image=docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-derpmetrics:latest, imageID=docker-pullable://docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-derpmetrics@sha256:ec68cf8707428bf83f2fb4e0683aa94bfd7acfd0927de29212a3ee9eb3b8ecdb, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://075a6b4aff0db13aa24a5acc30392227192aa7c5fd8a050156e490809fdfb930, exitCode=1, finishedAt=2017-09-25T04:49:39Z, message=null, reason=Error, signal=null, startedAt=2017-09-25T04:49:39Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2017-09-24 21:49:40 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver
Container image: docker_ecr.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-derpmetrics:latest
Container state: Terminated
Exit code: 1
2017-09-24 21:49:40 INFO Client:54 - Application kmderived finished.
I'm using a podpreset to provide the actual secrets.
Huh, I'm going to make that fail and see if anything changes (I doubt it) (Nope, had no effect).
Of course!
Let's try keeping the --jars
out of the equation, and adding the resource staging server specs back in. Not sure if the pod presets are contributing to this error, but as I mentioned, the configmap mounting is a rather cryptic error, so we can't be certain about the cause yet.
That gave me the following error:
MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "kmderived-1506316079615-init-config" not found
I didn't see an attempt to load an actual secret, so that's different. But presumably that's just taking the --jars out of the picture.
I did however also scrape a previously failed pod creation yaml and was able to launch it. It did mount the secret / config maps. But then crashed with the following error which might be suspicious:
++ id -u
+ myuid=0
++ id -g
+ mygid=0
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ /sbin/tini -s -- /bin/sh -c 'SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && env | grep SPARK_JAVA_OPT_ | sed '\''s/[^=]*=\(.*\)/\1/g'\'' > /tmp/java_opts.txt && readarray -t SPARK_DRIVER_JAVA_OPTS < /tmp/java_opts.txt && if ! [ -z ${SPARK_MOUNTED_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && if ! [ -z ${SPARK_SUBMIT_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_SUBMIT_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && if ! [ -z ${SPARK_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && if ! [ -z ${SPARK_MOUNTED_FILES_DIR+x} ]; then cp -R "$SPARK_MOUNTED_FILES_DIR/." .; fi && if ! [ -z ${SPARK_MOUNTED_FILES_FROM_SECRET_DIR+x} ]; then cp -R "$SPARK_MOUNTED_FILES_FROM_SECRET_DIR/." .; fi && ${JAVA_HOME}/bin/java "${SPARK_DRIVER_JAVA_OPTS[@]}" -cp $SPARK_CLASSPATH -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY $SPARK_DRIVER_CLASS $PYSPARK_PRIMARY $PYSPARK_FILES $SPARK_DRIVER_ARGS'
Error: Could not find or load main class .opt.spark.jars.RoaringBitmap-0.5.11.jar
But presumably is just related to having no startup arguments, however the config-map was in fact present.
Thanks for all the assistance, needing to sign off for the evening (PDT timezone)
Interesting. I wonder if this is related to my PR #464 . Try adding just the spark-examples JAR with the --jars
param to the spec of your latest submission, and if it works, then it is related. If not, hopefully the resulting error is more revealing of the actual issue.
not much in the way of meaningfull change:
#!/usr/bin/env bash
ECR_URL="514588561872.dkr.ecr.us-east-1.amazonaws.com"
SPARK_DRIVER="$ECR_URL/spark-driver-py-derpmetrics:latest"
SPARK_EXECUTOR="$ECR_URL/spark-executor-py-derpmetrics:latest"
SPARK_INIT="$ECR_URL/spark-init-derpmetrics:latest"
PATH_TO_JOB="$PWD/jobs.zip"
PATH_TO_DRIVER="$PWD/jobs/derpmetrics_warehouse/Derived_NonAppPageViewed/Compute__Derived_NonAppPageViewed.py"
PATH_TO_SPARK_EXAMPLES="$PWD/infra/spark-2.2.0-k8s-0.4.0-SNAPSHOT-bin-2.7.3/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0-SNAPSHOT.jar"
echo $PATH_TO_JOB
echo $PATH_TO_DRIVER
echo $SPARK_DRIVER
echo $SPARK_EXECUTOR
./infra/spark-2.2.0-k8s-0.4.0-SNAPSHOT-bin-2.7.3/bin/spark-submit \
--verbose \
--deploy-mode cluster \
--master k8s://http://127.0.0.1:8001 \
--kubernetes-namespace default \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-k8s-service-account \
--conf spark.kubernetes.docker.image.pullPolicy=Always \
--conf spark.driver.memory=4G \
--conf spark.driver.cores=2 \
--conf spark.executor.instances=1 \
--conf spark.executor.cores=2\
--conf spark.executor.memory=6G \
--conf spark.app.name=kmderived \
--conf spark.kubernetes.driver.docker.image=$SPARK_DRIVER \
--conf spark.kubernetes.executor.docker.image=$SPARK_EXECUTOR \
--conf spark.kubernetes.initcontainer.docker.image=$SPARK_INIT \
--conf spark.kubernetes.driver.labels=role=derpmetrics \
--conf spark.kubernetes.executor.labels=role=derpmetrics \
--conf spark.kubernetes.resourceStagingServer.uri=http://herpip-421161154.us-east-1.elb.amazonaws.com:10000 \
--jars file://$PATH_TO_SPARK_EXAMPLES\
file://$PATH_TO_DRIVER
Results in:
MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "kmderived-1506351376470-init-config" not found
And pod failed to initialize.
Just reading your PR now
Your PR doesn't look overly related but who knows, I tried one more time with just my python file and still got the same error.
MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "kmderived-1506353114000-init-config" not found
I'm still wondering about mount order etc. Correct me if I'm wrong but the problem needs to be happening in the context of 'spark-submit'
I got to the point of being able to test out a spark submit with spark running locally in debug mode, but hadn't gotten to the point of doing any significant debugging. Given that my submit steps are so simple I'll probably wire up my spark submit args to local-dev and start trying to troubleshoot there. It seems as if the secret / config map is being created after the pod spec is executed. But presumably that's just behaviour, not root cause. Will update if I find anything and if we get another spark 2.2 cut I'll try that too. I did run integration tests last night against my local build and all was 'good'... Also, when I bake all my code into the images it seems to work with the exception of: https://github.com/apache-spark-on-k8s/spark/issues/504
In fact, that blockage was why I tried to move to submitting local resources. If I wanted to debug that issue I really wanted to be able to iterate faster.
I'm wondering if perhaps my local snapshot build is broken in some subtle way:
garylucas@gary-lucas:~/dev/spark|branch-2.2-kubernetes⚡
⇒ git rev-parse HEAD
8a0f485d83f84a9b556f7634fed8cb1b65e12159
garylucas@gary-lucas:~/dev/spark|branch-2.2-kubernetes⚡
⇒ ./dev/make-distribution.sh --pip --tgz -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver -Pkubernetes -Phadoop-2.7 -Dhadoop.version=2.7.3
As mentioned in #504 I'm building my own due to broken service account issues in the release. If a new build gets cut I'll try that.
The pod does get created before we create the config map and secret. This is because we need the Pod's UID to create the owner reference graph from the config map and secret back up to the pod. But if the pod is created first, it should eventually recover after the API server detects that the dependent objects are created even after the fact. @foxish - could this perhaps be a bug in Kubernetes? @luck02 what Kubernetes version is being used?
I see I've got some version skew, that could be doing me in too. I'll update and try again.
garylucas@gary-lucas:~/dev/ub-data-hub|dat-730⚡
⇒ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.2", GitCommit:"922a86cfcd65915a9b2f69f3f193b8907d741d9c", GitTreeState:"clean", BuildDate:"2017-07-21T19:06:19Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4+coreos.0", GitCommit:"4bb697e04f7c356347aee6ffaa91640b428976d5", GitTreeState:"clean", BuildDate:"2017-08-22T08:43:47Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
1.7.x should be new enough though so I don't see that being the root issue.
Upgraded my kubectl, no impact as predicted.
Are there any Kubernetes logs that can be read that might give more insight here?
Can we also remove any PodPresets? Read a bit earlier that those are being applied, and just want to reduce the number of variables at play here.
I tried that last evening, didn't seem to affect the situation. just reconfirmed MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "kmderived-1506375022936-init-config" not found
what's interesting is that it shows pending
after the failure and then switches to failed
Here's the pod describe:
⇒ kubectl describe pod kmderived-1506375022936-driver
Name: kmderived-1506375022936-driver
Namespace: default
Node: ip-10-253-51-182.ec2.internal/10.253.51.182
Start Time: Mon, 25 Sep 2017 14:30:26 -0700
Labels: spark-app-selector=spark-ec57f4301e754e728468e6141ff0e013
spark-role=driver
Annotations: spark-app-name=kmderived
Status: Failed
IP: 10.2.75.15
Init Containers:
spark-init:
Container ID: docker://f96c6130a921c46b48813cf9127ed05cb4a87d158a608d215628788ec858c93d
Image: ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-init-derpmetrics:latest
Image ID: docker-pullable://ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-init-derpmetrics@sha256:0f244ccfaf6eeedf793c808616b2867ccb7c709c74d6fea1694faaf558684a50
Port: <none>
Args:
/etc/spark-init/spark-init.properties
State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 25 Sep 2017 14:30:37 -0700
Finished: Mon, 25 Sep 2017 14:31:10 -0700
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/etc/spark-init from spark-init-properties (rw)
/mnt/secrets/spark-init from spark-init-secret (rw)
/var/run/secrets/kubernetes.io/serviceaccount from spark-k8s-service-account-token-fn4qg (ro)
/var/spark-data/spark-files from download-files (rw)
/var/spark-data/spark-jars from download-jars-volume (rw)
Containers:
spark-kubernetes-driver:
Container ID:
Image: ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-derpmetrics:latest
Image ID:
Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
memory: 4505Mi
Requests:
cpu: 2
memory: 4Gi
Environment:
SPARK_DRIVER_MEMORY: 4G
SPARK_DRIVER_CLASS: org.apache.spark.deploy.PythonRunner
SPARK_DRIVER_ARGS:
SPARK_MOUNTED_FILES_DIR: /var/spark-data/spark-files
PYSPARK_PRIMARY: /var/spark-data/spark-files/Compute__Derived_NonAppPageViewed.py
PYSPARK_FILES: null
SPARK_JAVA_OPT_0: -Dspark.kubernetes.docker.image.pullPolicy=Always
SPARK_JAVA_OPT_1: -Dspark.kubernetes.namespace=default
SPARK_JAVA_OPT_2: -Dspark.kubernetes.initcontainer.executor.configmapkey=download-submitted-files
SPARK_JAVA_OPT_3: -Dspark.kubernetes.driver.pod.name=kmderived-1506375022936-driver
SPARK_JAVA_OPT_4: -Dspark.files=/var/spark-data/spark-files/Compute__Derived_NonAppPageViewed.py
SPARK_JAVA_OPT_5: -Dspark.kubernetes.driver.docker.image=ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-derpmetrics:latest
SPARK_JAVA_OPT_6: -Dspark.submit.deployMode=cluster
SPARK_JAVA_OPT_7: -Dspark.driver.blockManager.port=7079
SPARK_JAVA_OPT_8: -Dspark.app.name=kmderived
SPARK_JAVA_OPT_9: -Dspark.driver.cores=2
SPARK_JAVA_OPT_10: -Dspark.driver.host=kmderived-1506375022936-driver-svc.default.svc.cluster.local
SPARK_JAVA_OPT_11: -Dspark.kubernetes.authenticate.driver.serviceAccountName=spark-k8s-service-account
SPARK_JAVA_OPT_12: -Dspark.executor.cores=2
SPARK_JAVA_OPT_13: -Dspark.driver.port=7078
SPARK_JAVA_OPT_14: -Dspark.kubernetes.initcontainer.executor.stagingServerSecret.name=kmderived-1506375022936-init-secret
SPARK_JAVA_OPT_15: -Dspark.app.id=spark-ec57f4301e754e728468e6141ff0e013
SPARK_JAVA_OPT_16: -Dspark.kubernetes.initcontainer.executor.stagingServerSecret.mountDir=/mnt/secrets/spark-init
SPARK_JAVA_OPT_17: -Dspark.executor.memory=6G
SPARK_JAVA_OPT_18: -Dspark.kubernetes.resourceStagingServer.uri=http://ab5edaaa8a01911e7b97c026e1e11203-421161154.us-east-1.elb.amazonaws.com:10000
SPARK_JAVA_OPT_19: -Dspark.driver.bindAddress=kmderived-1506375022936-driver-svc.default.svc.cluster.local
SPARK_JAVA_OPT_20: -Dspark.executor.instances=1
SPARK_JAVA_OPT_21: -Dspark.kubernetes.executor.podNamePrefix=kmderived-1506375022936
SPARK_JAVA_OPT_22: -Dspark.master=k8s://http://127.0.0.1:8001
SPARK_JAVA_OPT_23: -Dspark.driver.memory=4G
SPARK_JAVA_OPT_24: -Dspark.kubernetes.initcontainer.docker.image=ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-init-derpmetrics:latest
SPARK_JAVA_OPT_25: -Dspark.kubernetes.executor.docker.image=ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-derpmetrics:latest
SPARK_JAVA_OPT_26: -Dspark.kubernetes.initcontainer.executor.configmapname=kmderived-1506375022936-init-config
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from spark-k8s-service-account-token-fn4qg (ro)
/var/spark-data/spark-files from download-files (rw)
/var/spark-data/spark-jars from download-jars-volume (rw)
Conditions:
Type Status
Initialized False
Ready False
PodScheduled True
Volumes:
spark-init-properties:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kmderived-1506375022936-init-config
Optional: false
download-jars-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
download-files:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
spark-init-secret:
Type: Secret (a volume populated by a Secret)
SecretName: kmderived-1506375022936-init-secret
Optional: false
spark-k8s-service-account-token-fn4qg:
Type: Secret (a volume populated by a Secret)
SecretName: spark-k8s-service-account-token-fn4qg
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 default-scheduler Normal Scheduled Successfully assigned kmderived-1506375022936-driver to ip-10-253-51-182.ec2.internal
2m 2m 1 kubelet, ip-10-253-51-182.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "download-files"
2m 2m 1 kubelet, ip-10-253-51-182.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "download-jars-volume"
2m 2m 1 kubelet, ip-10-253-51-182.ec2.internal Warning FailedMount MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "kmderived-1506375022936-init-config" not found
2m 2m 1 kubelet, ip-10-253-51-182.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "spark-k8s-service-account-token-fn4qg"
2m 2m 1 kubelet, ip-10-253-51-182.ec2.internal Warning FailedMount MountVolume.SetUp failed for volume "spark-init-secret" : secrets "kmderived-1506375022936-init-secret" not found
2m 2m 1 kubelet, ip-10-253-51-182.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "spark-init-secret"
2m 2m 1 kubelet, ip-10-253-51-182.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "spark-init-properties"
2m 2m 1 kubelet, ip-10-253-51-182.ec2.internal spec.initContainers{spark-init} Normal Pulling pulling image "ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-init-derpmetrics:latest"
2m 2m 1 kubelet, ip-10-253-51-182.ec2.internal spec.initContainers{spark-init} Normal Pulled Successfully pulled image "ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-init-derpmetrics:latest"
2m 2m 1 kubelet, ip-10-253-51-182.ec2.internal spec.initContainers{spark-init} Normal Created Created container
2m 2m 1 kubelet, ip-10-253-51-182.ec2.internal spec.initContainers{spark-init} Normal Started Started container
Is there anything maybe in the kubelet or the API server logs?
Ok, that helped. Appologies, I should have checked this earlier. Our log introspection tooling is pretty weak at the moment.
There is a pile of logs:
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: ++ id -u
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: + myuid=0
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: ++ id -g
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: + mygid=0
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: ++ getent passwd 0
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: + uidentry=root:x:0:0:root:/root:/bin/ash
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: + '[' -z root:x:0:0:root:/root:/bin/ash ']'
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: + /sbin/tini -s -- /opt/spark/bin/spark-class org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer /etc/spark-init/spark-init.properties
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:14 INFO KubernetesSparkDependencyDownloadInitContainer:54 - Starting init-container to download Spark application dependencies.
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO SecurityManager:54 - Changing view acls to: root
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO SecurityManager:54 - Changing modify acls to: root
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO SecurityManager:54 - Changing view acls groups to:
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO SecurityManager:54 - Changing modify acls groups to:
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO SecurityManager:54 - Changing view acls to: root
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO SecurityManager:54 - Changing modify acls to: root
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO SecurityManager:54 - Changing view acls groups to:
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO SecurityManager:54 - Changing modify acls groups to:
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO KubernetesSparkDependencyDownloadInitContainer:54 - Starting to download jars from resource staging server...
Sep 25 21:27:34 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: 2017-09-26 04:27:15 INFO KubernetesSparkDependencyDownloadInitContainer:54 - Starting to download files from resource staging server...
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer$$anonfun$waitForFutures$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:187)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer$$anonfun$waitForFutures$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:187)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer.waitForFutures(KubernetesSparkDependencyDownloadInitContainer.scala:186)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer.run(KubernetesSparkDependencyDownloadInitContainer.scala:140)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer$.main(KubernetesSparkDependencyDownloadInitContainer.scala:222)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer.main(KubernetesSparkDependencyDownloadInitContainer.scala)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: Caused by: java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: connect timed out
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:279)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.WaitableCallback$class.waitForCompletion(KubernetesSparkDependencyDownloadInitContainer.scala:53)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.DownloadTarGzCallback.waitForCompletion(KubernetesSparkDependencyDownloadInitContainer.scala:57)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer$$anonfun$org$apache$spark$deploy$rest$kubernetes$KubernetesSparkDependencyDownloadInitContainer$$downloadResourcesFromStagingServer$1$$anonfun$apply$3.apply(KubernetesSparkDependencyDownloadInitContainer.scala:167)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer$$anonfun$org$apache$spark$deploy$rest$kubernetes$KubernetesSparkDependencyDownloadInitContainer$$downloadResourcesFromStagingServer$1$$anonfun$apply$3.apply(KubernetesSparkDependencyDownloadInitContainer.scala:156)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at scala.Option.foreach(Option.scala:257)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer$$anonfun$org$apache$spark$deploy$rest$kubernetes$KubernetesSparkDependencyDownloadInitContainer$$downloadResourcesFromStagingServer$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:156)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer$$anonfun$org$apache$spark$deploy$rest$kubernetes$KubernetesSparkDependencyDownloadInitContainer$$downloadResourcesFromStagingServer$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:155)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at scala.Option.foreach(Option.scala:257)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer.org$apache$spark$deploy$rest$kubernetes$KubernetesSparkDependencyDownloadInitContainer$$downloadResourcesFromStagingServer(KubernetesSparkDependencyDownloadInitContainer.scala:155)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer$$anonfun$1.apply$mcV$sp(KubernetesSparkDependencyDownloadInitContainer.scala:104)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer$$anonfun$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:104)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at org.apache.spark.deploy.rest.kubernetes.KubernetesSparkDependencyDownloadInitContainer$$anonfun$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:104)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at java.lang.Thread.run(Thread.java:748)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: Caused by: java.net.SocketTimeoutException: connect timed out
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at java.net.PlainSocketImpl.socketConnect(Native Method)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at java.net.Socket.connect(Socket.java:589)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.platform.Platform.connectSocket(Platform.java:124)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:220)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:146)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:186)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:179)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.RealCall$AsyncCall.execute(RealCall.java:129)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
Sep 25 21:27:46 data-cluster-slow-test k8s.kmderived-1506400028781-driver_default_spark-init: ... 3 more
Not familiar enough with the under-the-hood machinery to form an opinion.
Here's a fuller log including all the kubelet stuff... But it's big and crufty:
Sep 26 08:16:54 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:16:54.037608 1 gc_controller.go:62] PodGC is force deleting Pod: default:kmderived-1506400028781-driver
Sep 26 08:16:54 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:16:54.082009 1 gc_controller.go:166] Forced deletion of orphaned Pod kmderived-1506400028781-driver succeeded
Sep 26 08:16:54 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:16:54.104348 1 servicecontroller.go:740] Service has been deleted default/kmderived-1506400028781-driver-svc
Sep 26 08:16:54 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:16:54.267595 1 gc_controller.go:62] PodGC is force deleting Pod: default:kmderived-1506399639732-driver
Sep 26 08:16:54 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:16:54.310623 1 gc_controller.go:166] Forced deletion of orphaned Pod kmderived-1506399639732-driver succeeded
Sep 26 08:16:54 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:16:54.310824 1 gc_controller.go:62] PodGC is force deleting Pod: default:kmderived-1506399812445-driver
Sep 26 08:16:54 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:16:54.350596 1 servicecontroller.go:740] Service has been deleted default/kmderived-1506399639732-driver-svc
Sep 26 08:16:54 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:16:54.354022 1 gc_controller.go:166] Forced deletion of orphaned Pod kmderived-1506399812445-driver succeeded
Sep 26 08:16:54 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:16:54.379049 1 servicecontroller.go:740] Service has been deleted default/kmderived-1506399812445-driver-svc
Sep 26 08:18:57 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:18:56.792076 1 servicecontroller.go:740] Service has been deleted default/kmderived-1506400028781-driver-svc
Sep 26 08:24:22 data-cluster-slow-test k8s.kube-dns-3664836949-15b1l_kube-system_kubedns: I0926 15:24:21.595476 1 dns.go:552] Could not find endpoints for service "kmderived-1506400028781-driver-svc" in namespace "default". DNS records will be created once endpoints show up.
Sep 26 08:24:58 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:24:57.793296 1 servicecontroller.go:740] Service has been deleted default/kmderived-1506400028781-driver-svc
Sep 26 08:31:04 data-cluster-slow-test k8s.kube-dns-3664836949-h9370_kube-system_kubedns: I0926 15:31:04.281200 1 dns.go:552] Could not find endpoints for service "kmderived-1506400028781-driver-svc" in namespace "default". DNS records will be created once endpoints show up.
Sep 26 08:33:06 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:33:05.793954 1 servicecontroller.go:740] Service has been deleted default/kmderived-1506400028781-driver-svc
Sep 26 08:38:50 data-cluster-slow-test k8s.kube-dns-3664836949-15b1l_kube-system_kubedns: I0926 15:38:49.606999 1 dns.go:552] Could not find endpoints for service "kmderived-1506400028781-driver-svc" in namespace "default". DNS records will be created once endpoints show up.
Sep 26 08:43:09 data-cluster-slow-test k8s.kube-dns-3664836949-h9370_kube-system_kubedns: I0926 15:43:09.293169 1 dns.go:552] Could not find endpoints for service "kmderived-1506400028781-driver-svc" in namespace "default". DNS records will be created once endpoints show up.
Sep 26 08:50:07 data-cluster-slow-test k8s.kube-controller-manager-ip-10-253-46-89.ec2.internal_kube-sy: I0926 15:50:06.795441 1 servicecontroller.go:740] Service has been deleted default/kmderived-1506400028781-driver-svc
Sep 26 08:52:07 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"I0926 15:52:07.010718 1727 reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume started for volume \"spark-init-properties\" (UniqueName: \"kubernetes.io/configmap/a5e157ac-a2d2-11e7-a071-0e80cdd21cb6-spark-init-properties\") pod \"kmderived-1506441123636-driver\" (UID: \"a5e157ac-a2d2-11e7-a071-0e80cdd21cb6\")"}
Sep 26 08:52:07 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"I0926 15:52:07.010758 1727 reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume started for volume \"download-files\" (UniqueName: \"kubernetes.io/empty-dir/a5e157ac-a2d2-11e7-a071-0e80cdd21cb6-download-files\") pod \"kmderived-1506441123636-driver\" (UID: \"a5e157ac-a2d2-11e7-a071-0e80cdd21cb6\")"}
Sep 26 08:52:07 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"I0926 15:52:07.010784 1727 reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume started for volume \"download-jars-volume\" (UniqueName: \"kubernetes.io/empty-dir/a5e157ac-a2d2-11e7-a071-0e80cdd21cb6-download-jars-volume\") pod \"kmderived-1506441123636-driver\" (UID: \"a5e157ac-a2d2-11e7-a071-0e80cdd21cb6\")"}
Sep 26 08:52:07 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"I0926 15:52:07.010806 1727 reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume started for volume \"spark-k8s-service-account-token-fn4qg\" (UniqueName: \"kubernetes.io/secret/a5e157ac-a2d2-11e7-a071-0e80cdd21cb6-spark-k8s-service-account-token-fn4qg\") pod \"kmderived-1506441123636-driver\" (UID: \"a5e157ac-a2d2-11e7-a071-0e80cdd21cb6\")"}
Sep 26 08:52:07 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"I0926 15:52:07.010943 1727 reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume started for volume \"spark-init-secret\" (UniqueName: \"kubernetes.io/secret/a5e157ac-a2d2-11e7-a071-0e80cdd21cb6-spark-init-secret\") pod \"kmderived-1506441123636-driver\" (UID: \"a5e157ac-a2d2-11e7-a071-0e80cdd21cb6\")"}
Sep 26 08:52:07 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"E0926 15:52:07.114849 1727 configmap.go:202] Couldn't get configMap default/kmderived-1506441123636-init-config: configmaps \"kmderived-1506441123636-init-config\" not found"}
Sep 26 08:52:07 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"E0926 15:52:07.115008 1727 nestedpendingoperations.go:262] Operation for \"\\\"kubernetes.io/configmap/a5e157ac-a2d2-11e7-a071-0e80cdd21cb6-spark-init-properties\\\" (\\\"a5e157ac-a2d2-11e7-a071-0e80cdd21cb6\\\")\" failed. No retries permitted until 2017-09-26 15:52:07.614916423 +0000 UTC (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume \"spark-init-properties\" (UniqueName: \"kubernetes.io/configmap/a5e157ac-a2d2-11e7-a071-0e80cdd21cb6-spark-init-properties\") pod \"kmderived-1506441123636-driver\" (UID: \"a5e157ac-a2d2-11e7-a071-0e80cdd21cb6\") : configmaps \"kmderived-1506441123636-init-config\" not found"}
Sep 26 08:52:07 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"E0926 15:52:07.117656 1727 secret.go:201] Couldn't get secret default/kmderived-1506441123636-init-secret: secrets \"kmderived-1506441123636-init-secret\" not found"}
Sep 26 08:52:07 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"E0926 15:52:07.117749 1727 nestedpendingoperations.go:262] Operation for \"\\\"kubernetes.io/secret/a5e157ac-a2d2-11e7-a071-0e80cdd21cb6-spark-init-secret\\\" (\\\"a5e157ac-a2d2-11e7-a071-0e80cdd21cb6\\\")\" failed. No retries permitted until 2017-09-26 15:52:07.617719263 +0000 UTC (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume \"spark-init-secret\" (UniqueName: \"kubernetes.io/secret/a5e157ac-a2d2-11e7-a071-0e80cdd21cb6-spark-init-secret\") pod \"kmderived-1506441123636-driver\" (UID: \"a5e157ac-a2d2-11e7-a071-0e80cdd21cb6\") : secrets \"kmderived-1506441123636-init-secret\" not found"}
Sep 26 08:52:07 data-cluster-slow-test k8s.kube-scheduler-ip-10-253-48-222.ec2.internal_kube-system_kub: I0926 15:52:06.832447 1 event.go:218] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"kmderived-1506441123636-driver", UID:"a5e157ac-a2d2-11e7-a071-0e80cdd21cb6", APIVersion:"v1", ResourceVersion:"11214775", FieldPath:""}): type: 'Normal' reason: 'Scheduled' Successfully assigned kmderived-1506441123636-driver to ip-10-253-50-31.ec2.internal
Sep 26 08:52:07 data-cluster-slow-test k8s.kube-dns-3664836949-h9370_kube-system_kubedns: I0926 15:52:07.294174 1 dns.go:552] Could not find endpoints for service "kmderived-1506441123636-driver-svc" in namespace "default". DNS records will be created once endpoints show up.
Sep 26 08:52:08 data-cluster-slow-test k8s.kube-dns-3664836949-15b1l_kube-system_kubedns: I0926 15:52:07.282993 1 dns.go:552] Could not find endpoints for service "kmderived-1506441123636-driver-svc" in namespace "default". DNS records will be created once endpoints show up.
Sep 26 08:52:08 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"I0926 15:52:07.831701 1727 kuberuntime_manager.go:457] Container {Name:spark-kubernetes-driver Image:514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-kissmetrics:latest Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:SPARK_DRIVER_MEMORY Value:4G ValueFrom:nil} {Name:SPARK_DRIVER_CLASS Value:org.apache.spark.deploy.PythonRunner ValueFrom:nil} {Name:SPARK_DRIVER_ARGS Value: ValueFrom:nil} {Name:SPARK_MOUNTED_FILES_DIR Value:/var/spark-data/spark-files ValueFrom:nil} {Name:PYSPARK_PRIMARY Value:/var/spark-data/spark-files/Compute__Derived_NonAppPageViewed.py ValueFrom:nil} {Name:PYSPARK_FILES Value:null ValueFrom:nil} {Name:SPARK_JAVA_OPT_0 Value:-Dspark.kubernetes.docker.image.pullPolicy=Always ValueFrom:nil} {Name:SPARK_JAVA_OPT_1 Value:-Dspark.kubernetes.namespace=default ValueFrom:nil} {Name:SPARK_JAVA_OPT_2 Value:-Dspark.kubernetes.initcontainer.executor.configmapkey=download-submitted-files ValueFrom:nil} {Name:SPARK_JAVA_OPT_3 Value:-Dspark.kubernetes.driver.docker.image=514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-kissmetrics:latest ValueFrom:nil} {Name:SPARK_JAVA_OPT_4 Value:-Dspark.files=/var/spark-data/spark-files/Compute__Derived_NonAppPageViewed.py ValueFrom:nil} {Name:SPARK_JAVA_OPT_5 Value:-Dspark.submit.deployMode=cluster ValueFrom:nil} {Name:SPARK_JAVA_OPT_6 Value:-Dspark.driver.blockManager.port=7079 ValueFrom:nil} {Name:SPARK_JAVA_OPT_7 Value:-Dspark.app.name=kmderived ValueFrom:nil} {Name:SPARK_JAVA_OPT_8 Value:-Dspark.driver.bindAddress=kmderived-1506441123636-driver-svc.default.svc.cluster.local ValueFrom:nil} {Name:SPARK_JAVA_OPT_9 Value:-Dspark.driver.cores=2 ValueFrom:nil} {Name:SPARK_JAVA_OPT_10 Value:-Dspark.kubernetes.authenticate.driver.serviceAccountName=spark-k8s-service-account ValueFrom:nil} {Name:SPARK_JAVA_OPT_11 Value:-Dspark.app.id=spark-067e91b9e35e43d580fea426ac797f3a ValueFrom:nil} {Name:SPARK_JAVA_OPT_12 Value:-Dspark.executor.cores=2 ValueFrom:nil} {Name:SPARK_JAVA_OPT_13 Value:-Dspark.driver.port=7078 ValueFrom:nil} {Name:SPARK_J"}
Sep 26 08:52:08 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"AVA_OPT_14 Value:-Dspark.driver.host=kmderived-1506441123636-driver-svc.default.svc.cluster.local ValueFrom:nil} {Name:SPARK_JAVA_OPT_15 Value:-Dspark.kubernetes.initcontainer.executor.stagingServerSecret.mountDir=/mnt/secrets/spark-init ValueFrom:nil} {Name:SPARK_JAVA_OPT_16 Value:-Dspark.executor.memory=6G ValueFrom:nil} {Name:SPARK_JAVA_OPT_17 Value:-Dspark.kubernetes.initcontainer.executor.stagingServerSecret.name=kmderived-1506441123636-init-secret ValueFrom:nil} {Name:SPARK_JAVA_OPT_18 Value:-Dspark.kubernetes.resourceStagingServer.uri=http://ab5edaaa8a01911e7b97c026e1e11203-421161154.us-east-1.elb.amazonaws.com:10000 ValueFrom:nil} {Name:SPARK_JAVA_OPT_19 Value:-Dspark.kubernetes.driver.pod.name=kmderived-1506441123636-driver ValueFrom:nil} {Name:SPARK_JAVA_OPT_20 Value:-Dspark.executor.instances=1 ValueFrom:nil} {Name:SPARK_JAVA_OPT_21 Value:-Dspark.master=k8s://http://127.0.0.1:8001 ValueFrom:nil} {Name:SPARK_JAVA_OPT_22 Value:-Dspark.kubernetes.initcontainer.executor.configmapname=kmderived-1506441123636-init-config ValueFrom:nil} {Name:SPARK_JAVA_OPT_23 Value:-Dspark.driver.memory=4G ValueFrom:nil} {Name:SPARK_JAVA_OPT_24 Value:-Dspark.kubernetes.executor.podNamePrefix=kmderived-1506441123636 ValueFrom:nil} {Name:SPARK_JAVA_OPT_25 Value:-Dspark.kubernetes.initcontainer.docker.image=514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-init-kissmetrics:latest ValueFrom:nil} {Name:SPARK_JAVA_OPT_26 Value:-Dspark.kubernetes.executor.docker.image=514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-kissmetrics:latest ValueFrom:nil}] Resources:{Limits:map[memory:{i:{value:4723834880 scale:0} d:{Dec:<nil>} s:4505Mi Format:BinarySI}] Requests:map[cpu:{i:{value:2 scale:0} d:{Dec:<nil>} s:2 Format:DecimalSI} memory:{i:{value:4294967296 scale:0} d:{Dec:<nil>} s:4Gi Format:BinarySI}]} VolumeMounts:[{Name:download-jars-volume ReadOnly:false MountPath:/var/spark-data/spark-jars SubPath:} {Name:download-files ReadOnly:false MountPath:/var/spark-data/spark-files SubPath:} {Name:spark-k8s-service-account-token-"}
Sep 26 08:52:11 data-cluster-slow-test k8s.cluster-autoscaler-1682651915-z6j80_kube-system_cluster-auto: I0926 15:52:10.518471 1 cluster.go:89] Fast evaluation: node ip-10-253-50-31.ec2.internal cannot be removed: default/kmderived-1506441123636-driver is not replicated
Sep 26 08:52:20 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"I0926 15:52:19.824063 1727 kuberuntime_manager.go:457] Container {Name:spark-kubernetes-driver Image:514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-kissmetrics:latest Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:SPARK_DRIVER_MEMORY Value:4G ValueFrom:nil} {Name:SPARK_DRIVER_CLASS Value:org.apache.spark.deploy.PythonRunner ValueFrom:nil} {Name:SPARK_DRIVER_ARGS Value: ValueFrom:nil} {Name:SPARK_MOUNTED_FILES_DIR Value:/var/spark-data/spark-files ValueFrom:nil} {Name:PYSPARK_PRIMARY Value:/var/spark-data/spark-files/Compute__Derived_NonAppPageViewed.py ValueFrom:nil} {Name:PYSPARK_FILES Value:null ValueFrom:nil} {Name:SPARK_JAVA_OPT_0 Value:-Dspark.kubernetes.docker.image.pullPolicy=Always ValueFrom:nil} {Name:SPARK_JAVA_OPT_1 Value:-Dspark.kubernetes.namespace=default ValueFrom:nil} {Name:SPARK_JAVA_OPT_2 Value:-Dspark.kubernetes.initcontainer.executor.configmapkey=download-submitted-files ValueFrom:nil} {Name:SPARK_JAVA_OPT_3 Value:-Dspark.kubernetes.driver.docker.image=514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-kissmetrics:latest ValueFrom:nil} {Name:SPARK_JAVA_OPT_4 Value:-Dspark.files=/var/spark-data/spark-files/Compute__Derived_NonAppPageViewed.py ValueFrom:nil} {Name:SPARK_JAVA_OPT_5 Value:-Dspark.submit.deployMode=cluster ValueFrom:nil} {Name:SPARK_JAVA_OPT_6 Value:-Dspark.driver.blockManager.port=7079 ValueFrom:nil} {Name:SPARK_JAVA_OPT_7 Value:-Dspark.app.name=kmderived ValueFrom:nil} {Name:SPARK_JAVA_OPT_8 Value:-Dspark.driver.bindAddress=kmderived-1506441123636-driver-svc.default.svc.cluster.local ValueFrom:nil} {Name:SPARK_JAVA_OPT_9 Value:-Dspark.driver.cores=2 ValueFrom:nil} {Name:SPARK_JAVA_OPT_10 Value:-Dspark.kubernetes.authenticate.driver.serviceAccountName=spark-k8s-service-account ValueFrom:nil} {Name:SPARK_JAVA_OPT_11 Value:-Dspark.app.id=spark-067e91b9e35e43d580fea426ac797f3a ValueFrom:nil} {Name:SPARK_JAVA_OPT_12 Value:-Dspark.executor.cores=2 ValueFrom:nil} {Name:SPARK_JAVA_OPT_13 Value:-Dspark.driver.port=7078 ValueFrom:nil} {Name:SPARK_J"}
Sep 26 08:52:20 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"AVA_OPT_14 Value:-Dspark.driver.host=kmderived-1506441123636-driver-svc.default.svc.cluster.local ValueFrom:nil} {Name:SPARK_JAVA_OPT_15 Value:-Dspark.kubernetes.initcontainer.executor.stagingServerSecret.mountDir=/mnt/secrets/spark-init ValueFrom:nil} {Name:SPARK_JAVA_OPT_16 Value:-Dspark.executor.memory=6G ValueFrom:nil} {Name:SPARK_JAVA_OPT_17 Value:-Dspark.kubernetes.initcontainer.executor.stagingServerSecret.name=kmderived-1506441123636-init-secret ValueFrom:nil} {Name:SPARK_JAVA_OPT_18 Value:-Dspark.kubernetes.resourceStagingServer.uri=http://ab5edaaa8a01911e7b97c026e1e11203-421161154.us-east-1.elb.amazonaws.com:10000 ValueFrom:nil} {Name:SPARK_JAVA_OPT_19 Value:-Dspark.kubernetes.driver.pod.name=kmderived-1506441123636-driver ValueFrom:nil} {Name:SPARK_JAVA_OPT_20 Value:-Dspark.executor.instances=1 ValueFrom:nil} {Name:SPARK_JAVA_OPT_21 Value:-Dspark.master=k8s://http://127.0.0.1:8001 ValueFrom:nil} {Name:SPARK_JAVA_OPT_22 Value:-Dspark.kubernetes.initcontainer.executor.configmapname=kmderived-1506441123636-init-config ValueFrom:nil} {Name:SPARK_JAVA_OPT_23 Value:-Dspark.driver.memory=4G ValueFrom:nil} {Name:SPARK_JAVA_OPT_24 Value:-Dspark.kubernetes.executor.podNamePrefix=kmderived-1506441123636 ValueFrom:nil} {Name:SPARK_JAVA_OPT_25 Value:-Dspark.kubernetes.initcontainer.docker.image=514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-init-kissmetrics:latest ValueFrom:nil} {Name:SPARK_JAVA_OPT_26 Value:-Dspark.kubernetes.executor.docker.image=514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-kissmetrics:latest ValueFrom:nil}] Resources:{Limits:map[memory:{i:{value:4723834880 scale:0} d:{Dec:<nil>} s:4505Mi Format:BinarySI}] Requests:map[cpu:{i:{value:2 scale:0} d:{Dec:<nil>} s:2 Format:DecimalSI} memory:{i:{value:4294967296 scale:0} d:{Dec:<nil>} s:4Gi Format:BinarySI}]} VolumeMounts:[{Name:download-jars-volume ReadOnly:false MountPath:/var/spark-data/spark-jars SubPath:} {Name:download-files ReadOnly:false MountPath:/var/spark-data/spark-files SubPath:} {Name:spark-k8s-service-account-token-"}
Sep 26 08:52:21 data-cluster-slow-test k8s.cluster-autoscaler-1682651915-z6j80_kube-system_cluster-auto: I0926 15:52:21.177298 1 cluster.go:89] Fast evaluation: node ip-10-253-50-31.ec2.internal cannot be removed: default/kmderived-1506441123636-driver is not replicated
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: ++ id -u
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: + myuid=0
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: ++ id -g
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: + mygid=0
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: ++ getent passwd 0
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: + uidentry=root:x:0:0:root:/root:/bin/ash
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: + '[' -z root:x:0:0:root:/root:/bin/ash ']'
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: + /sbin/tini -s -- /opt/spark/bin/spark-class org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer /etc/spark-init/spark-init.properties
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:19 INFO KubernetesSparkDependencyDownloadInitContainer:54 - Starting init-container to download Spark application dependencies.
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO SecurityManager:54 - Changing view acls to: root
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO SecurityManager:54 - Changing modify acls to: root
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO SecurityManager:54 - Changing view acls groups to:
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO SecurityManager:54 - Changing modify acls groups to:
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO SecurityManager:54 - Changing view acls to: root
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO SecurityManager:54 - Changing modify acls to: root
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO SecurityManager:54 - Changing view acls groups to:
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO SecurityManager:54 - Changing modify acls groups to:
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO KubernetesSparkDependencyDownloadInitContainer:54 - Starting to download files from resource staging server...
Sep 26 08:52:21 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: 2017-09-26 15:52:20 INFO KubernetesSparkDependencyDownloadInitContainer:54 - Starting to download jars from resource staging server...
Sep 26 08:52:21 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"I0926 15:52:20.829040 1727 kuberuntime_manager.go:457] Container {Name:spark-kubernetes-driver Image:514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-kissmetrics:latest Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:SPARK_DRIVER_MEMORY Value:4G ValueFrom:nil} {Name:SPARK_DRIVER_CLASS Value:org.apache.spark.deploy.PythonRunner ValueFrom:nil} {Name:SPARK_DRIVER_ARGS Value: ValueFrom:nil} {Name:SPARK_MOUNTED_FILES_DIR Value:/var/spark-data/spark-files ValueFrom:nil} {Name:PYSPARK_PRIMARY Value:/var/spark-data/spark-files/Compute__Derived_NonAppPageViewed.py ValueFrom:nil} {Name:PYSPARK_FILES Value:null ValueFrom:nil} {Name:SPARK_JAVA_OPT_0 Value:-Dspark.kubernetes.docker.image.pullPolicy=Always ValueFrom:nil} {Name:SPARK_JAVA_OPT_1 Value:-Dspark.kubernetes.namespace=default ValueFrom:nil} {Name:SPARK_JAVA_OPT_2 Value:-Dspark.kubernetes.initcontainer.executor.configmapkey=download-submitted-files ValueFrom:nil} {Name:SPARK_JAVA_OPT_3 Value:-Dspark.kubernetes.driver.docker.image=514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-kissmetrics:latest ValueFrom:nil} {Name:SPARK_JAVA_OPT_4 Value:-Dspark.files=/var/spark-data/spark-files/Compute__Derived_NonAppPageViewed.py ValueFrom:nil} {Name:SPARK_JAVA_OPT_5 Value:-Dspark.submit.deployMode=cluster ValueFrom:nil} {Name:SPARK_JAVA_OPT_6 Value:-Dspark.driver.blockManager.port=7079 ValueFrom:nil} {Name:SPARK_JAVA_OPT_7 Value:-Dspark.app.name=kmderived ValueFrom:nil} {Name:SPARK_JAVA_OPT_8 Value:-Dspark.driver.bindAddress=kmderived-1506441123636-driver-svc.default.svc.cluster.local ValueFrom:nil} {Name:SPARK_JAVA_OPT_9 Value:-Dspark.driver.cores=2 ValueFrom:nil} {Name:SPARK_JAVA_OPT_10 Value:-Dspark.kubernetes.authenticate.driver.serviceAccountName=spark-k8s-service-account ValueFrom:nil} {Name:SPARK_JAVA_OPT_11 Value:-Dspark.app.id=spark-067e91b9e35e43d580fea426ac797f3a ValueFrom:nil} {Name:SPARK_JAVA_OPT_12 Value:-Dspark.executor.cores=2 ValueFrom:nil} {Name:SPARK_JAVA_OPT_13 Value:-Dspark.driver.port=7078 ValueFrom:nil} {Name:SPARK_J"}
Sep 26 08:52:21 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"AVA_OPT_14 Value:-Dspark.driver.host=kmderived-1506441123636-driver-svc.default.svc.cluster.local ValueFrom:nil} {Name:SPARK_JAVA_OPT_15 Value:-Dspark.kubernetes.initcontainer.executor.stagingServerSecret.mountDir=/mnt/secrets/spark-init ValueFrom:nil} {Name:SPARK_JAVA_OPT_16 Value:-Dspark.executor.memory=6G ValueFrom:nil} {Name:SPARK_JAVA_OPT_17 Value:-Dspark.kubernetes.initcontainer.executor.stagingServerSecret.name=kmderived-1506441123636-init-secret ValueFrom:nil} {Name:SPARK_JAVA_OPT_18 Value:-Dspark.kubernetes.resourceStagingServer.uri=http://ab5edaaa8a01911e7b97c026e1e11203-421161154.us-east-1.elb.amazonaws.com:10000 ValueFrom:nil} {Name:SPARK_JAVA_OPT_19 Value:-Dspark.kubernetes.driver.pod.name=kmderived-1506441123636-driver ValueFrom:nil} {Name:SPARK_JAVA_OPT_20 Value:-Dspark.executor.instances=1 ValueFrom:nil} {Name:SPARK_JAVA_OPT_21 Value:-Dspark.master=k8s://http://127.0.0.1:8001 ValueFrom:nil} {Name:SPARK_JAVA_OPT_22 Value:-Dspark.kubernetes.initcontainer.executor.configmapname=kmderived-1506441123636-init-config ValueFrom:nil} {Name:SPARK_JAVA_OPT_23 Value:-Dspark.driver.memory=4G ValueFrom:nil} {Name:SPARK_JAVA_OPT_24 Value:-Dspark.kubernetes.executor.podNamePrefix=kmderived-1506441123636 ValueFrom:nil} {Name:SPARK_JAVA_OPT_25 Value:-Dspark.kubernetes.initcontainer.docker.image=514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-init-kissmetrics:latest ValueFrom:nil} {Name:SPARK_JAVA_OPT_26 Value:-Dspark.kubernetes.executor.docker.image=514588561872.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-kissmetrics:latest ValueFrom:nil}] Resources:{Limits:map[memory:{i:{value:4723834880 scale:0} d:{Dec:<nil>} s:4505Mi Format:BinarySI}] Requests:map[cpu:{i:{value:2 scale:0} d:{Dec:<nil>} s:2 Format:DecimalSI} memory:{i:{value:4294967296 scale:0} d:{Dec:<nil>} s:4Gi Format:BinarySI}]} VolumeMounts:[{Name:download-jars-volume ReadOnly:false MountPath:/var/spark-data/spark-jars SubPath:} {Name:download-files ReadOnly:false MountPath:/var/spark-data/spark-files SubPath:} {Name:spark-k8s-service-account-token-"}
Sep 26 08:52:28 data-cluster-slow-test k8s.kubernetes-dashboard-439418514-207jv_kube-system_kubernetes-: [2017-09-26T15:52:28Z] Incoming HTTP/1.1 GET /api/v1/pod/default/kmderived-1506441123636-driver request from 10.2.122.0:47336: {}
Sep 26 08:52:28 data-cluster-slow-test k8s.kubernetes-dashboard-439418514-207jv_kube-system_kubernetes-: Getting details of kmderived-1506441123636-driver pod in default namespace
Sep 26 08:52:28 data-cluster-slow-test k8s.kubernetes-dashboard-439418514-207jv_kube-system_kubernetes-: Found 10 events related to kmderived-1506441123636-driver pod in default namespace
Sep 26 08:52:31 data-cluster-slow-test k8s.cluster-autoscaler-1682651915-z6j80_kube-system_cluster-auto: I0926 15:52:31.367843 1 cluster.go:89] Fast evaluation: node ip-10-253-50-31.ec2.internal cannot be removed: default/kmderived-1506441123636-driver is not replicated
Sep 26 08:52:31 data-cluster-slow-test k8s.kubernetes-dashboard-439418514-207jv_kube-system_kubernetes-: [2017-09-26T15:52:31Z] Incoming HTTP/1.1 GET /api/v1/log/source/default/kmderived-1506441123636-driver/pod request from 10.2.122.0:47336: {}
Sep 26 08:52:31 data-cluster-slow-test k8s.kubernetes-dashboard-439418514-207jv_kube-system_kubernetes-: [2017-09-26T15:52:31Z] Incoming HTTP/1.1 GET /api/v1/log/default/kmderived-1506441123636-driver request from 10.2.122.0:47336: {}
Sep 26 08:52:42 data-cluster-slow-test k8s.cluster-autoscaler-1682651915-z6j80_kube-system_cluster-auto: I0926 15:52:41.564109 1 cluster.go:89] Fast evaluation: node ip-10-253-50-31.ec2.internal cannot be removed: default/kmderived-1506441123636-driver is not replicated
Sep 26 08:52:45 data-cluster-slow-test k8s.kubernetes-dashboard-439418514-207jv_kube-system_kubernetes-: [2017-09-26T15:52:45Z] Incoming HTTP/1.1 GET /api/v1/pod/default/kmderived-1506441123636-driver request from 10.2.122.0:47336: {}
Sep 26 08:52:45 data-cluster-slow-test k8s.kubernetes-dashboard-439418514-207jv_kube-system_kubernetes-: Getting details of kmderived-1506441123636-driver pod in default namespace
Sep 26 08:52:45 data-cluster-slow-test k8s.kubernetes-dashboard-439418514-207jv_kube-system_kubernetes-: Found 10 events related to kmderived-1506441123636-driver pod in default namespace
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$waitForFutures$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:187)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$waitForFutures$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:187)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer.waitForFutures(KubernetesSparkDependencyDownloadInitContainer.scala:186)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer.run(KubernetesSparkDependencyDownloadInitContainer.scala:140)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$.main(KubernetesSparkDependencyDownloadInitContainer.scala:222)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer.main(KubernetesSparkDependencyDownloadInitContainer.scala)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: Caused by: java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: connect timed out
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:279)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.WaitableCallback$class.waitForCompletion(KubernetesSparkDependencyDownloadInitContainer.scala:53)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.DownloadTarGzCallback.waitForCompletion(KubernetesSparkDependencyDownloadInitContainer.scala:57)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$org$apache$spark$deploy$rest$k8s$KubernetesSparkDependencyDownloadInitContainer$$downloadResourcesFromStagingServer$1$$anonfun$apply$3.apply(KubernetesSparkDependencyDownloadInitContainer.scala:167)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$org$apache$spark$deploy$rest$k8s$KubernetesSparkDependencyDownloadInitContainer$$downloadResourcesFromStagingServer$1$$anonfun$apply$3.apply(KubernetesSparkDependencyDownloadInitContainer.scala:156)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at scala.Option.foreach(Option.scala:257)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$org$apache$spark$deploy$rest$k8s$KubernetesSparkDependencyDownloadInitContainer$$downloadResourcesFromStagingServer$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:156)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$org$apache$spark$deploy$rest$k8s$KubernetesSparkDependencyDownloadInitContainer$$downloadResourcesFromStagingServer$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:155)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at scala.Option.foreach(Option.scala:257)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer.org$apache$spark$deploy$rest$k8s$KubernetesSparkDependencyDownloadInitContainer$$downloadResourcesFromStagingServer(KubernetesSparkDependencyDownloadInitContainer.scala:155)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$1.apply$mcV$sp(KubernetesSparkDependencyDownloadInitContainer.scala:104)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:104)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:104)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at java.lang.Thread.run(Thread.java:748)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: Caused by: java.net.SocketTimeoutException: connect timed out
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at java.net.PlainSocketImpl.socketConnect(Native Method)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at java.net.Socket.connect(Socket.java:589)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.platform.Platform.connectSocket(Platform.java:124)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:220)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:146)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:186)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:179)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.RealCall$AsyncCall.execute(RealCall.java:129)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
Sep 26 08:52:51 data-cluster-slow-test k8s.kmderived-1506441123636-driver_default_spark-init: ... 3 more
Sep 26 08:52:53 data-cluster-slow-test systemd: {"_BOOT_ID":"55547778ef634c26a15c3b46b6f2e020","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_SYSTEMD_SLICE":"system.slice","_CAP_EFFECTIVE":"3fffffffff","_TRANSPORT":"stdout","_MACHINE_ID":"50e74f29f36b43eca88a7f72305e96b0","_SELINUX_CONTEXT":"system_u:system_r:kernel_t:s0","_HOSTNAME":"ip-10-253-50-31.ec2.internal","_SYSTEMD_UNIT":"kubelet.service","_SYSTEMD_INVOCATION_ID":"499393ff4e984b5a81b921137ec4d4bd","SYSLOG_IDENTIFIER":"kubelet-wrapper","_PID":"1727","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_COMM":"kubelet","_EXE":"/var/lib/rkt/pods/run/3e947ebc-2d9f-4100-a899-2bd270d4748e/stage1/rootfs/opt/stage2/hyperkube/rootfs/hyperkube","_CMDLINE":"/kubelet --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni --container-runtime=docker --rkt-path=/usr/bin/rkt --rkt-stage1-image=coreos.com/rkt/stage1-coreos --node-labels data.unbounce.net/role=worker --register-node=true --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.253.50.31 --cluster-domain=cluster.local --cloud-provider=aws --cert-dir=/etc/kubernetes/ssl --tls-cert-file=/etc/kubernetes/ssl/worker.pem --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml --require-kubeconfig","MESSAGE":"E0926 15:52:52.925532 1727 kuberuntime_container.go:59] Can't make a ref to pod \"kmderived-1506441123636-driver_default(a5e157ac-a2d2-11e7-a071-0e80cdd21cb6)\", container spark-init: selfLink was empty, can't make reference"}
What's interesting is that I'm seeing app logs in there (socket timeout exception) but they don't show up in the pod logs in the UI. Not sure if that's expected.
Ahh, ok. so it's blowing up in k8s.kmderived-1506441123636-driver_default_spark-init:
IE before the primary container has done anything. which makes sense. But is ultra confusing when those logs aren't available in the UI.
One thing I was never sure on was exactly how the resource staging server communicates with running processes. IE how does a running init container find it's resource staging server?
Here's the spec on my resource staging server pod:
⇒ kubectl describe pod spark-resource-staging-server-4091971298-8ncwj
Name: spark-resource-staging-server-4091971298-8ncwj
Namespace: default
Node: ip-10-253-42-225.ec2.internal/10.253.42.225
Start Time: Mon, 25 Sep 2017 16:32:55 -0700
Labels: pod-template-hash=4091971298
resource-staging-server-instance=default
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"spark-resource-staging-server-4091971298","uid":"b5c340fe-a019-1...
Status: Running
IP: 10.2.31.25
Created By: ReplicaSet/spark-resource-staging-server-4091971298
Controlled By: ReplicaSet/spark-resource-staging-server-4091971298
Containers:
spark-resource-staging-server:
Container ID: docker://b39929d218a61d4b88cb667810b7f84e11c479cbb5ad1a0ca5cd3a5fe0fd3c5e
Image: kubespark/spark-resource-staging-server:v2.2.0-kubernetes-0.3.0
Image ID: docker-pullable://kubespark/spark-resource-staging-server@sha256:151b58248a1a0bc9a1bbc82e186a64cfb36dc108bead07a22d1e69e5e1c2ead3
Port: <none>
Args:
/etc/spark-resource-staging-server/resource-staging-server.properties
State: Running
Started: Mon, 25 Sep 2017 16:33:06 -0700
Ready: True
Restart Count: 0
Limits:
cpu: 4
memory: 1Gi
Requests:
cpu: 2
memory: 768Mi
Environment: <none>
Mounts:
/etc/spark-resource-staging-server from resource-staging-server-properties (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-00b84 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
resource-staging-server-properties:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: spark-resource-staging-server-config
Optional: false
default-token-00b84:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-00b84
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events: <none>
and service:
garylucas@gary-lucas:~/dev/ub-data-kissmetrics|DAT-spark-on-k8s⚡
⇒ kubectl describe services spark-resource-staging-service
Name: spark-resource-staging-service
Namespace: default
Labels: <none>
Annotations: <none>
Selector: resource-staging-server-instance=default
Type: LoadBalancer
IP: 10.3.0.161
LoadBalancer Ingress: lb_ingress.etc
Port: <unset> 10000/TCP
NodePort: <unset> 32496/TCP
Endpoints: 10.2.31.25:10000
Session Affinity: None
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2d 19m 64 service-controller Normal UpdatedLoadBalancer Updated load balancer with new hosts
it's whitelisted as well, so it's not open to the world. We'll sort out the TLS / SSL stuff probably on our loadbalancer.
One thing I was never sure on was exactly how the resource staging server communicates with running processes. IE how does a running init container find it's resource staging server?
This should be derived from spark.kubernetes.resourceStagingServer.uri
, which must be a URI that is routable both from the submission client and also from the init-container.
Ahh, ok. so it's blowing up in k8s.kmderived-1506441123636-driver_default_spark-init: IE before the primary container has done anything. which makes sense. But is ultra confusing when those logs aren't available in the UI.
init-container logs in the UI might be a question for @foxish, but we actually had this information in the kubectl describe pod
from earlier and happened to miss this:
Init Containers:
spark-init:
Container ID: docker://f96c6130a921c46b48813cf9127ed05cb4a87d158a608d215628788ec858c93d
Image: ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-init-derpmetrics:latest
Image ID: docker-pullable://ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-init-derpmetrics@sha256:0f244ccfaf6eeedf793c808616b2867ccb7c709c74d6fea1694faaf558684a50
Port: <none>
Args:
/etc/spark-init/spark-init.properties
State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 25 Sep 2017 14:30:37 -0700
Finished: Mon, 25 Sep 2017 14:31:10 -0700
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/etc/spark-init from spark-init-properties (rw)
/mnt/secrets/spark-init from spark-init-secret (rw)
/var/run/secrets/kubernetes.io/serviceaccount from spark-k8s-service-account-token-fn4qg (ro)
/var/spark-data/spark-files from download-files (rw)
/var/spark-data/spark-jars from download-jars-volume (rw)
Ok, I think this was added in 2.2 spark.kubernetes.resourceStagingServer.internal.uri
This issue is that I had to white list that URL and the whitelist doesn't include the cluster.
I'll test that shortly and then close this.
Also, I moved to 2.2 release. Thanks!
Ok, so it was a combination of things: 1) The IP for the staging server was only visible to my whitelists. 2) The documentation (as far as I read, didn't obviously indicate that the URL needed to be accessible from internal and external... Might have been there). 3) The logs were appearing in the spark-init container which doesn't (for whatever reason) log to the primary console. IE the problem was really effectively obfuscated. 4) PEBKAC
Fix was spark 2.2 brought in an internal uri.
I get this result when trying to initialize the driver:
MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "kmderived-1506142014799-init-config" not found
This config map does exist, same situation with the secret.
Also, the spark submit call hangs for quite awhile:
Note the nearly 2 minutes between the last set and the pod status updates? Is that normal?
I've got about 13mb of jar's that are uploading, so that may be normal... However it seems pretty slow
If it's that slow, that's fine. But I'm more wondering if something isn't timing out on the back end.