This guide provides a Helm chart to deploy Livy on Kubernetes without relying on cloud services like AWS, GCP, or Azure. This setup can save development time and cost, and it allows debugging using an IDE. For debugging Livy on Kubernetes as a standalone setup, Apache Spark and Apache Livy must be deployed in Kubernetes.
Install Helm.
Add the required Helm chart repositories:
helm repo add cert-manager https://charts.jetstack.io
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
Add an entry to the /etc/hosts
file:
127.0.0.1 my-cluster.example.com
Install the cert-manager CustomResourceDefinition resources:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.0/cert-manager.yaml
References:
Build the Helm chart using the following command:
helm dependency build
Create a Kubernetes namespace for the Livy deployment:
kubectl create namespace <namespace-name>
Install the Livy cluster using the Helm chart:
helm -n <namespace-name> install livycluster .
Create an interactive session:
curl -k -X POST -H "Content-Type: application/json" --data '{"kind": "spark"}' https://my-cluster.example.com/livy/sessions | jq
Note: You need curl
and jq
utilities installed on your local machine for testing.
Create a statement:
curl -k -X POST -d '{ "kind": "spark", "code": "sc.parallelize(1 to 10).count()" }' -H "Content-Type: application/json" \
https://my-cluster.example.com/livy/sessions/0/statements | jq
Create a batch job:
curl -s -k -H "Content-Type: application/json" \
-X POST \
-d '{
"name": "testbatch1",
"className": "org.apache.spark.examples.SparkPi",
"numExecutors": 2,
"file": "local:///opt/spark/examples/jars/spark-examples_2.12-3.2.3.jar",
"args": ["10000"]
}' "https://my-cluster.example.com/livy/batches" | jq
Steps to create Docker images for Spark and Livy are documented at Docker.md.