apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

direct_kafka_wordcount.py #513

Closed hardymansen closed 7 years ago

hardymansen commented 7 years ago

Maybe more of a python , kafka issue but i don't know. I am getting this error when trying to connect to my kubernetes kafka cluster to read a topic.

error.txt

I add to create my own driver image with spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar in jars/

my spark submit job look as follows:

bin/spark-submit \
  --deploy-mode cluster \
  --master k8s://https://kubernetes.default:443 \
  --kubernetes-namespace default \
  --conf spark.app.name=myApp \
  --conf spark.kubernetes.driver.docker.image=docker.mycompany.com/mydriver-py:2 \
  --conf spark.kubernetes.executor.docker.image=kubespark/executor-py:v2.1.0-kubernetes-0.3.1 \
  --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.1 \
  --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.3.0.jar \
  local:///opt/spark/examples/src/main/python/streaming/direct_kafka_wordcount.py broker:9092 mytopic

Need my own executor image also?

ifilonenko commented 7 years ago

There seems to be a mention of SparkHadoopUtil. Does access to the stream require Kerberos?

hardymansen commented 7 years ago

No authentication is required to kafka. i am using another image with kafkacat to create and read topics fine.

hardymansen commented 7 years ago

I created my own executor-py and it seems to work.