Closed c-w closed 6 years ago
We only need two charts, so might as well just include them in the repo directly. This makes customizing the charts also easier.
commit 1bb87e9e74afb8ae9aad27d290ba8636e1d6bc9a
Author: Clemens Wolff <clewolff@microsoft.com>
Date: Thu Dec 7 09:42:54 2017 -0500
Import helm charts
Source: https://github.com/erikschlegel/charts/tree/spark-localssd
See: https://github.com/CatalystCode/project-fortis-pipeline/issues/244
diff --git a/project-fortis-pipeline/ops/charts/cassandra/.helmignore b/project-fortis-pipeline/ops/charts/cassandra/.helmignore
new file mode 100644
index 0000000..f0c1319
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/cassandra/.helmignore
@@ -0,0 +1,21 @@
+# Patterns to ignore when building packages.
+# This supports shell glob matching, relative path matching, and
+# negation (prefixed with !). Only one pattern per line.
+.DS_Store
+# Common VCS dirs
+.git/
+.gitignore
+.bzr/
+.bzrignore
+.hg/
+.hgignore
+.svn/
+# Common backup files
+*.swp
+*.bak
+*.tmp
+*~
+# Various IDEs
+.project
+.idea/
+*.tmproj
diff --git a/project-fortis-pipeline/ops/charts/cassandra/Chart.yaml b/project-fortis-pipeline/ops/charts/cassandra/Chart.yaml
new file mode 100644
index 0000000..9668411
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/cassandra/Chart.yaml
@@ -0,0 +1,11 @@
+name: cassandra
+home: http://cassandra.apache.org
+version: 0.1.0
+description: A highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
+icon: http://cassandra.apache.org/img/cassandra_logo.png
+sources:
+ - https://github.com/apache/cassandra
+keywords:
+ - cassandra
+ - nosql
+ - database
diff --git a/project-fortis-pipeline/ops/charts/cassandra/README.md b/project-fortis-pipeline/ops/charts/cassandra/README.md
new file mode 100644
index 0000000..3071891
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/cassandra/README.md
@@ -0,0 +1,81 @@
+# Multi-node Cassandra Cluster using StatefulSets
+
+[Apache Cassandra](http://cassandra.apache.org) is a free and open-source distributed database system designed to handle large amounts of data across multiple servers, providing high availability with no one point of failure.
+
+## Credit
+
+Credit to https://github.com/kubernetes/kubernetes/tree/master/examples/storage/cassandra. This is an implementation of that work as a Helm Chart.
+
+## Introduction
+
+This chart bootstraps a multi-node Cassandra deployment on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager. This work is largely based upon the StatefulSet example for deploying Cassandra documented in this [tutorial](https://github.com/kubernetes/kubernetes/tree/master/examples/storage/cassandra).
+
+## Prerequisites
+
+- Kubernetes 1.4+ with Beta APIs enabled (Kubernetes 1.5.3+ for Azure)
+- PV provisioner support in the underlying infrastructure
+
+## Installing the Chart
+
+To install the chart with the release name `my-release`:
+
+```bash
+$ helm install --name my-release incubator/cassandra
+```
+
+The command deploys a Cassandra cluster on the Kubernetes cluster using the default configuration. The [configuration](#configuration) section lists the parameters that can be configured during installation.
+
+### Uninstall
+
+To uninstall/delete the `my-release` deployment:
+
+```bash
+$ helm delete my-release
+```
+
+It should be noted, that the installation of the chart installs a persistent volume claim per Cassandra node deployed. These volumes are not deleted by the `helm delete` command. They can be managed using the `kubectl delete` command for Persistent Volume Claim resources.
+
+## Configuration
+
+The following tables lists the configurable parameters of the Cassandra chart and their default values.
+
+| Parameter | Description | Default |
+| ----------------------- | ------------------------------------------ | ----------------------------------- |
+| `Image` | `cassandra` image. | gcr.io/google-samples/cassandra |
+| `ImageTag` | `cassandra` image tag. | v12 |
+| `replicaCount` | Number of `cassandra` instances to run | 3 |
+| `cassandra.MaxHeapSize` | Max heap for JVM running `cassandra`. | 512M |
+| `cassandra.HeapNewSize` | Min heap size for JVM running `cassandra.` | 100M |
+| `cassandra.ClusterName` | Name of the `cassandra` cluster. | K8Demo |
+| `cassandra.DC` | Name of the DC for `cassandra` cluster. | DC1-K8Demo |
+| `cassandra.Rack` | Name of the Rack for `cassandra` cluster. | Rack1-K8Demo |
+| `persistence.enabled` | Create a volume to store data | true |
+| `persistence.size` | Size of persistent volume claim | 10Gi |
+| `persistence.storageClass` | Type of persistent volume claim | default |
+| `persistence.accessMode` | ReadWriteOnce or ReadOnly | ReadWriteOnce |
+| `resources` | CPU/Memory resource requests/limits | Memory: `1Gi`, CPU: `500m` |
+
+
+Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
+
+## Persistence
+
+The deployment of the Cassandra cluster relies on persistent storage. A PersistentVolumeClaim is created and mounted in the directory `/cassandra-data` on a per Cassandra instance.
+
+By default, the chart will uses the default StorageClass for the provider where Kubernetes is running. If `default` isn't supported, or if one wants to use a specifc StorageClass, for instance premium storage on Azure, one would need to define the appropriate StorageClass and update the values.yaml file or use the `--set key=persitence.storageClass=<value>` flag to specify such. To specify a Premium Storage disk (ssd) on Azure, the yaml file for the StorageClass definition would resemble:
+
+```
+# https://kubernetes.io/docs/user-guide/persistent-volumes/#azure-disk
+apiVersion: storage.k8s.io/v1beta1
+kind: StorageClass
+metadata:
+ name: fast
+ annotations:
+ storageclass.beta.kubernetes.io/is-default-class: "true"
+provisioner: kubernetes.io/azure-disk
+parameters:
+ skuName: Premium_LRS
+ location: westus
+```
+
+In order to disable this functionality you can change the values.yaml to disable persistence and use an emptyDir instead.
diff --git a/project-fortis-pipeline/ops/charts/cassandra/templates/_helpers.tpl b/project-fortis-pipeline/ops/charts/cassandra/templates/_helpers.tpl
new file mode 100644
index 0000000..f7877c3
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/cassandra/templates/_helpers.tpl
@@ -0,0 +1,16 @@
+{{/* vim: set filetype=mustache: */}}
+{{/*
+Expand the name of the chart.
+*/}}
+{{- define "name" -}}
+{{- default .Chart.Name .Values.nameOverride | trunc 24 | trimSuffix "-" -}}
+{{- end -}}
+
+{{/*
+Create a default fully qualified app name.
+We truncate at 24 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
+*/}}
+{{- define "fullname" -}}
+{{- $name := default .Chart.Name .Values.nameOverride -}}
+{{- printf "%s-%s" .Release.Name $name | trunc 24 | trimSuffix "-" -}}
+{{- end -}}
diff --git a/project-fortis-pipeline/ops/charts/cassandra/templates/statefulset.yaml b/project-fortis-pipeline/ops/charts/cassandra/templates/statefulset.yaml
new file mode 100644
index 0000000..0162aca
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/cassandra/templates/statefulset.yaml
@@ -0,0 +1,104 @@
+apiVersion: "apps/v1beta1"
+kind: StatefulSet
+metadata:
+ name: {{ template "fullname" . }}
+ labels:
+ app: {{ template "fullname" . }}
+ chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
+ release: "{{ .Release.Name }}"
+ heritage: "{{ .Release.Service }}"
+spec:
+ serviceName: {{ template "fullname" . }}
+ replicas: {{ .Values.replicaCount }}
+ template:
+ metadata:
+ name: "{{ printf "%s-%s" .Release.Name .Values.Name | trunc 24 }}"
+ labels:
+ app: {{ template "fullname" . }}
+ heritage: {{.Release.Service | quote }}
+ release: {{.Release.Name | quote }}
+ chart: "{{.Chart.Name}}-{{.Chart.Version}}"
+ component: "{{.Release.Name}}-{{.Values.Component}}"
+ spec:
+ affinity:
+ nodeAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ nodeSelectorTerms:
+ - matchExpressions:
+ - key: "beta.kubernetes.io/instance-type"
+ operator: In
+ values: ["{{.Values.VmInstanceType}}"]
+ podAntiAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ - labelSelector:
+ matchExpressions:
+ - key: app
+ operator: In
+ values: [{{ template "fullname" . }}]
+ topologyKey: kubernetes.io/hostname
+ containers:
+ - name: "{{ printf "%s-%s" .Release.Name .Values.Name | trunc 24 }}"
+ image: "{{.Values.Image}}:{{.Values.ImageTag}}"
+ imagePullPolicy: "{{.Values.ImagePullPolicy}}"
+ ports:
+ - containerPort: 7000
+ name: intra-node
+ - containerPort: 7001
+ name: tls-intra-node
+ - containerPort: 7199
+ name: jmx
+ - containerPort: 9042
+ name: cql
+ resources:
+ limits:
+ cpu: "{{.Values.resources.limits.cpu}}"
+ memory: "{{.Values.resources.limits.memory}}"
+ requests:
+ cpu: "{{.Values.resources.requests.cpu}}"
+ memory: "{{.Values.resources.requests.memory}}"
+ securityContext:
+ capabilities:
+ add:
+ - IPC_LOCK
+ lifecycle:
+ preStop:
+ exec:
+ command: ["/bin/sh", "-c", "PID=$(pidof java) && kill $PID && while ps -p $PID > /dev/null; do sleep 1; done"]
+ env:
+ - name: MAX_HEAP_SIZE
+ value: "{{.Values.cassandra.MaxHeapSize}}"
+ - name: HEAP_NEWSIZE
+ value: "{{.Values.cassandra.HeapNewSize}}"
+ - name: CASSANDRA_SEEDS
+ value: "{{ printf "%s-%s" .Release.Name .Values.Name | trunc 24 }}-0.{{ printf "%s-%s" .Release.Name .Values.Name | trunc 24 }}.{{.Release.Namespace}}.svc.cluster.local"
+ - name: CASSANDRA_CLUSTER_NAME
+ value: "{{.Values.cassandra.ClusterName}}"
+ - name: CASSANDRA_DC
+ value: "{{.Values.cassandra.DC}}"
+ - name: CASSANDRA_RACK
+ value: "{{.Values.cassandra.Rack}}"
+ - name: CASSANDRA_AUTO_BOOTSTRAP
+ value: "{{.Values.cassandra.AutoBootstrap}}"
+ - name: POD_IP
+ valueFrom:
+ fieldRef:
+ fieldPath: status.podIP
+ - name: POD_NAMESPACE
+ valueFrom:
+ fieldRef:
+ fieldPath: metadata.namespace
+ readinessProbe:
+ exec:
+ command:
+ - /bin/bash
+ - -c
+ - /ready-probe.sh
+ initialDelaySeconds: 15
+ timeoutSeconds: 5
+ volumeMounts:
+ - mountPath: /cassandra_data
+ name: cassandra-data
+ volumes:
+ - name: cassandra-data
+ hostPath:
+ path: "/mnt/cassandra"
\ No newline at end of file
diff --git a/project-fortis-pipeline/ops/charts/cassandra/templates/svc.yaml b/project-fortis-pipeline/ops/charts/cassandra/templates/svc.yaml
new file mode 100644
index 0000000..7ff60e8
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/cassandra/templates/svc.yaml
@@ -0,0 +1,34 @@
+# Headless service for stable DNS entries of StatefulSet members.
+apiVersion: v1
+kind: Service
+metadata:
+ name: {{ template "fullname" . }}
+ labels:
+ app: {{ template "fullname" . }}
+ chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
+ release: "{{ .Release.Name }}"
+ heritage: "{{ .Release.Service }}"
+spec:
+ ports:
+ - name: {{ template "fullname" . }}
+ port: 9042
+ clusterIP: None
+ selector:
+ app: {{ template "fullname" . }}
+---
+apiVersion: v1
+kind: Service
+metadata:
+ name: "{{ template "fullname" . }}-ext"
+ labels:
+ app: {{ template "fullname" . }}
+ chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
+ release: "{{ .Release.Name }}"
+ heritage: "{{ .Release.Service }}"
+spec:
+ ports:
+ - name: {{ template "fullname" . }}
+ port: 9042
+ selector:
+ app: {{ template "fullname" . }}
+ type: "LoadBalancer"
\ No newline at end of file
diff --git a/project-fortis-pipeline/ops/charts/cassandra/values.yaml b/project-fortis-pipeline/ops/charts/cassandra/values.yaml
new file mode 100644
index 0000000..b01ac0d
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/cassandra/values.yaml
@@ -0,0 +1,39 @@
+# Default values for cassandra.
+# This is a YAML-formatted file.
+# Declare name/value pairs to be passed into your templates.
+# name: value
+
+Name: cassandra
+Component: "cassandra"
+replicaCount: 6
+Image: "erikschlegel/cassandra"
+VmInstanceType: "Standard_L4s"
+ImageTag: "v12"
+ImagePullPolicy: "Always"
+
+# Cassandra configuration options
+# For chart deployment, the value for sending to the Seed Provider is
+# constructed using a template in the statefulset.yaml template
+cassandra:
+ MaxHeapSize: "4000M"
+ HeapNewSize: "100M"
+ ClusterName: "cassandra"
+ DC: "dc-eastus2-cassandra"
+ Rack: "rack-eastus2-cassandra"
+ AutoBootstrap: "false"
+
+# Persistence information
+persistence:
+ enabled: true
+ storageClass: fast
+ accessMode: ReadWriteOnce
+ size: 512Gi
+
+# Instance resources
+resources:
+ requests:
+ cpu: "1000m"
+ memory: "4Gi"
+ limits:
+ cpu: "2000m"
+ memory: "8Gi"
diff --git a/project-fortis-pipeline/ops/charts/spark/.helmignore b/project-fortis-pipeline/ops/charts/spark/.helmignore
new file mode 100644
index 0000000..f0c1319
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/spark/.helmignore
@@ -0,0 +1,21 @@
+# Patterns to ignore when building packages.
+# This supports shell glob matching, relative path matching, and
+# negation (prefixed with !). Only one pattern per line.
+.DS_Store
+# Common VCS dirs
+.git/
+.gitignore
+.bzr/
+.bzrignore
+.hg/
+.hgignore
+.svn/
+# Common backup files
+*.swp
+*.bak
+*.tmp
+*~
+# Various IDEs
+.project
+.idea/
+*.tmproj
diff --git a/project-fortis-pipeline/ops/charts/spark/Chart.yaml b/project-fortis-pipeline/ops/charts/spark/Chart.yaml
new file mode 100644
index 0000000..8eabf75
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/spark/Chart.yaml
@@ -0,0 +1,12 @@
+name: spark
+home: http://spark.apache.org/
+version: 0.1.3
+description: Fast and general-purpose cluster computing system.
+home: http://spark.apache.org
+icon: http://spark.apache.org/images/spark-logo-trademark.png
+sources:
+ - https://github.com/kubernetes/kubernetes/tree/master/examples/spark
+ - https://github.com/apache/spark
+maintainers:
+ - name: Erik Schlegel
+ email: erik.schlegel@gmail.com
diff --git a/project-fortis-pipeline/ops/charts/spark/README.md b/project-fortis-pipeline/ops/charts/spark/README.md
new file mode 100644
index 0000000..1620eb3
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/spark/README.md
@@ -0,0 +1,164 @@
+# Apache Spark Helm Chart
+
+Apache Spark is a fast and general-purpose cluster computing system including Apache Zeppelin.
+
+* http://spark.apache.org/
+* https://zeppelin.apache.org/
+
+Inspired from Helm Classic chart https://github.com/helm/charts
+
+## Chart Details
+This chart will do the following:
+
+* 1 x Spark Master with port 8080 exposed on an external LoadBalancer
+* 3 x Spark Workers with StatefulSets to scale to max 10 pods when CPU hits 50% of 100m
+* 1 x Zeppelin with port 8080 exposed on an external LoadBalancer
+* All using Kubernetes Deployments
+* Worker directory storage takes place on local SSD for each worker pod
+* Configurable mounted checkpoint directory for spark streaming-based applications
+
+## Prerequisites
+
+* Assumes that serviceAccount tokens are available under hostname metadata. (Works on GKE by default) URL -- http://metadata/computeMetadata/v1/instance/service-accounts/default/token
+
+## Installing the Chart
+
+To install the chart with the release name `my-release`:
+
+```bash
+$ helm install --name my-release stable/spark
+```
+
+## Configuration
+
+The following tables lists the configurable parameters of the Spark chart and their default values.
+
+### Spark Master
+
+| Parameter | Description | Default |
+| ----------------------- | ---------------------------------- | ---------------------------------------------------------- |
+| `Master.Name` | Spark master name | `spark-master` |
+| `Master.Image` | Container image name | `gcr.io/google_containers/spark` |
+| `Master.ImageTag` | Container image tag | `1.5.1_v3` |
+| `Master.Replicas` | k8s deployment replicas | `1` |
+| `Master.Component` | k8s selector key | `spark-master` |
+| `Master.Cpu` | container requested cpu | `100m` |
+| `Master.Memory` | container requested memory | `512Mi` |
+| `Master.ServicePort` | k8s service port | `7077` |
+| `Master.ContainerPort` | Container listening port | `7077` |
+| `Master.DaemonMemory` | Master JVM Xms and Xmx option | `1g` |
+| `Master.SparkSubmitCommand` |Specify initial command to run on master node(ie spark-submit) | `disabled` |
+| `Master.ConfigMapName` |Config Map reference for spark master environment | `disabled` |
+
+### Spark WebUi
+
+| Parameter | Description | Default |
+|-----------------------|----------------------------------|----------------------------------------------------------|
+| `WebUi.Name` | Spark webui name | `spark-webui` |
+| `WebUi.ServicePort` | k8s service port | `8080` |
+| `WebUi.ContainerPort` | Container listening port | `8080` |
+| `WebUi.Image` | The reverse proxy image to use for spark-ui-proxy | `elsonrodriguez/spark-ui-proxy:1.0`|
+| `WebUi.ProxyPort` | Reverse proxy port | `80` |
+
+### Persistence
+
+| Parameter | Description | Default |
+|--------------------------|----------------------------------|----------------------------------------------------------|
+| `PvcAcctName` | The storage account name secret used for managed disk | `undefined` |
+| `PvcPwd` | The storage account password secret used for managed disk | `undefined` |
+| `CheckpointDirectory` | The checkpoint directory used for spark streaming | `undefined` |
+| `CheckpointShare` | The checkpoint share used spark streaming | `undefined` |
+
+### Spark Worker
+
+| Parameter | Description | Default |
+| ----------------------- | ---------------------------------- | ---------------------------------------------------------- |
+| `Worker.Name` | Spark worker name | `spark-worker` |
+| `Worker.Image` | Container image name | `gcr.io/google_containers/spark` |
+| `Worker.ImageTag` | Container image tag | `1.5.1_v3` |
+| `Worker.Replicas` | k8s hpa and deployment replicas | `3` |
+| `Worker.ReplicasMax` | k8s hpa max replicas | `10` |
+| `Worker.Component` | k8s selector key | `spark-worker` |
+| `Master.ConfigMapName` |Config Map reference for spark worker environment | `disabled` |
+| `Worker.WorkingDirectory` | Directory to run applications in | `SPARK_HOME/work` |
+| `Worker.Cpu` | container requested cpu | `100m` |
+| `Worker.Memory` | container requested memory | `512Mi` |
+| `Worker.ContainerPort` | Container listening port | `7077` |
+| `Worker.CpuTargetPercentage` | k8s hpa cpu targetPercentage | `50` |
+| `Worker.DaemonMemory` | Worker JVM Xms and Xmx setting | `1g` |
+| `Worker.ExecutorMemory` | Worker memory available for executor | `1g` |
+| `Environment` | The worker environment configuration | `NA` |
+
+
+### Zeppelin
+
+| Parameter | Description | Default |
+|-------------------------|----------------------------------|----------------------------------------------------------|
+| `Zeppelin.Name` | Zeppelin name | `zeppelin-controller` |
+| `Zeppelin.Image` | Container image name | `gcr.io/google_containers/zeppelin` |
+| `Zeppelin.ImageTag` | Container image tag | `v0.5.5_v2` |
+| `Zeppelin.Replicas` | k8s deployment replicas | `1` |
+| `Zeppelin.Component` | k8s selector key | `zeppelin` |
+| `Zeppelin.Cpu` | container requested cpu | `100m` |
+| `Zeppelin.ServicePort` | k8s service port | `8080` |
+| `Zeppelin.ContainerPort`| Container listening port | `8080` |
+ |
+
+Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`.
+
+Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,
+
+```bash
+$ helm install --name my-release -f values.yaml stable/spark
+```
+
+> **Tip**: You can use the default [values.yaml](values.yaml)
+
+## Persistence
+
+The Spark image stores persistence under `/opt/spark/work` path of the container. A Persistent Volume
+Claim is used to keep the data across deployments.
+
+It is possible to mount several volumes using `Persistence.volumes` and `Persistence.mounts` parameters.
+
+## Do something with the cluster
+
+Use the kubectl exec to connect to Spark driver.
+
+```
+$ kubectl exec !Enter your spark master pod name here! -it bash
+root@your-spark-master:/#
+root@your-spark-master:/# pyspark
+Python 2.7.9 (default, Mar 1 2015, 12:57:24)
+[GCC 4.9.2] on linux2
+Type "help", "copyright", "credits" or "license" for more information.
+15/06/26 14:25:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+Welcome to
+ ____ __
+ / __/__ ___ _____/ /__
+ _\ \/ _ \/ _ `/ __/ '_/
+ /__ / .__/\_,_/_/ /_/\_\ version 1.4.0
+ /_/
+Using Python version 2.7.9 (default, Mar 1 2015 12:57:24)
+SparkContext available as sc, HiveContext available as sqlContext.
+>>> import socket
+>>> sc.parallelize(range(1000)).map(lambda x:socket.gethostname()).distinct().collect()
+17/03/24 19:42:44 INFO DAGScheduler: Job 0 finished: collect at <stdin>:1, took 2.260357 s
+['spark15-worker-2', 'spark15-worker-1', 'spark15-worker-0']
+```
+
+## Open the Spark UI to view your cluster
+
+Use the kubectl `get svc` command lookup the external IP of your reverse proxy.
+
+```
+$ kubectl get svc
+NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
+kubernetes 10.0.0.1 <none> 443/TCP 8d
+spark-master 10.0.46.27 52.168.36.95 7077:30399/TCP,8080:31312/TCP 2h
+spark-webui 10.0.102.137 **-->40.71.186.201<--** 80:31027/TCP 2h
+spark14-worker-0 10.0.234.55 52.179.10.132 8081:30408/TCP 3d
+spark15-zeppelin 10.0.229.183 40.71.190.17 8080:31840/TCP 2h
+```
+
+Open 40.71.186.201:8080 in your browser.
diff --git a/project-fortis-pipeline/ops/charts/spark/templates/NOTES.txt b/project-fortis-pipeline/ops/charts/spark/templates/NOTES.txt
new file mode 100644
index 0000000..d388df7
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/spark/templates/NOTES.txt
@@ -0,0 +1,16 @@
+1. Get the Spark URL to visit by running these commands in the same shell:
+
+ NOTE: It may take a few minutes for the LoadBalancer IP to be available.
+ You can watch the status of by running 'kubectl get svc --namespace {{ .Release.Namespace }} -w {{ template "webui-fullname" . }}'
+
+ export SPARK_SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ template "webui-fullname" . }} -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
+ echo http://$SPARK_SERVICE_IP:{{ .Values.WebUi.ServicePort }}
+
+2. Get the Zeppelin URL to visit by running these commands in the same shell:
+
+ NOTE: It may take a few minutes for the LoadBalancer IP to be available.
+ You can watch the status of by running 'kubectl get svc --namespace {{ .Release.Namespace }} -w {{ template "zeppelin-fullname" . }}'
+
+ export ZEPPELIN_SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ template "zeppelin-fullname" . }} -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
+ echo http://$ZEPPELIN_SERVICE_IP:{{ .Values.Zeppelin.ServicePort }}
+
diff --git a/project-fortis-pipeline/ops/charts/spark/templates/_helpers.tpl b/project-fortis-pipeline/ops/charts/spark/templates/_helpers.tpl
new file mode 100644
index 0000000..4eccc22
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/spark/templates/_helpers.tpl
@@ -0,0 +1,31 @@
+{{/* vim: set filetype=mustache: */}}
+{{/*
+Expand the name of the chart.
+*/}}
+{{- define "name" -}}
+{{- default .Chart.Name .Values.nameOverride | trunc 24 -}}
+{{- end -}}
+
+{{/*
+Create fully qualified names.
+We truncate at 24 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
+*/}}
+{{- define "master-fullname" -}}
+{{- $name := default .Chart.Name .Values.Master.Name -}}
+{{- printf "%s-%s" .Release.Name $name | trunc 24 -}}
+{{- end -}}
+
+{{- define "webui-fullname" -}}
+{{- $name := default .Chart.Name .Values.WebUi.Name -}}
+{{- printf "%s-%s" .Release.Name $name | trunc 24 -}}
+{{- end -}}
+
+{{- define "worker-fullname" -}}
+{{- $name := default .Chart.Name .Values.Worker.Name -}}
+{{- printf "%s-%s" .Release.Name $name | trunc 24 -}}
+{{- end -}}
+
+{{- define "zeppelin-fullname" -}}
+{{- $name := default .Chart.Name .Values.Zeppelin.Name -}}
+{{- printf "%s-%s" .Release.Name $name | trunc 24 -}}
+{{- end -}}
\ No newline at end of file
diff --git a/project-fortis-pipeline/ops/charts/spark/templates/secret.yaml b/project-fortis-pipeline/ops/charts/spark/templates/secret.yaml
new file mode 100644
index 0000000..4edbe3f
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/spark/templates/secret.yaml
@@ -0,0 +1,16 @@
+apiVersion: v1
+kind: Secret
+metadata:
+ name: checkpointing-pvc-secret
+type: Opaque
+data:
+ {{ if .Values.Persistence.PvcAcctName }}
+ azurestorageaccountname: {{ .Values.Persistence.PvcAcctName | b64enc | quote }}
+ {{ else }}
+ azurestorageaccountname: {{ randAlphaNum 10 | b64enc | quote }}
+ {{ end }}
+ {{ if .Values.Persistence.PvcPwd }}
+ azurestorageaccountkey: {{ .Values.Persistence.PvcPwd | b64enc | quote }}
+ {{ else }}
+ azurestorageaccountkey: {{ randAlphaNum 10 | b64enc | quote }}
+ {{ end }}
\ No newline at end of file
diff --git a/project-fortis-pipeline/ops/charts/spark/templates/spark-master-deployment.yaml b/project-fortis-pipeline/ops/charts/spark/templates/spark-master-deployment.yaml
new file mode 100644
index 0000000..e22596f
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/spark/templates/spark-master-deployment.yaml
@@ -0,0 +1,61 @@
+apiVersion: extensions/v1beta1
+kind: Deployment
+metadata:
+ name: spark-master
+ labels:
+ heritage: {{.Release.Service | quote }}
+ release: {{.Release.Name | quote }}
+ chart: "{{.Chart.Name}}-{{.Chart.Version}}"
+ component: "{{.Release.Name}}-{{.Values.Master.Component}}"
+spec:
+ replicas: 1
+ strategy:
+ type: RollingUpdate
+ selector:
+ matchLabels:
+ component: {{.Values.Master.Component}}
+ template:
+ metadata:
+ labels:
+ heritage: {{.Release.Service | quote }}
+ release: {{.Release.Name | quote }}
+ chart: "{{.Chart.Name}}-{{.Chart.Version}}"
+ component: {{.Values.Master.Component}}
+ spec:
+ containers:
+ - name: {{.Values.Master.Component}}
+ imagePullPolicy: "{{.Values.Master.ImagePullPolicy}}"
+ image: "{{.Values.Master.Image}}:{{.Values.Master.ImageTag}}"
+ {{- if .Values.Master.SparkSubmitCommand }}
+ lifecycle:
+ postStart:
+ exec:
+ command:
+ - "/bin/sh"
+ - "-c"
+ - |
+ {{.Values.Master.SparkSubmitCommand}}
+ {{- end }}
+ ports:
+ - containerPort: {{.Values.Master.ContainerPort}}
+ - containerPort: {{.Values.WebUi.ContainerPort}}
+ resources:
+ requests:
+ cpu: "{{.Values.Master.Resources.Requests.Cpu}}"
+ memory: "{{.Values.Master.Resources.Requests.Memory}}"
+ limits:
+ cpu: "{{.Values.Master.Resources.Limits.Cpu}}"
+ memory: "{{.Values.Master.Resources.Limits.Memory}}"
+ {{- if .Values.Master.ConfigMapName }}
+ envFrom:
+ - configMapRef:
+ name: "{{.Values.Master.ConfigMapName}}"
+ {{ else }}
+ env:
+ - name: SPARK_DAEMON_MEMORY
+ value: {{ default "1g" .Values.Master.DaemonMemory | quote }}
+ {{- if .Values.Master.EnableHA }}
+ - name: SPARK_DAEMON_JAVA_OPTS
+ value: "-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=zookeeper.spark.svc.cluster.local:2181"
+ {{- end }}
+ {{- end }}
\ No newline at end of file
diff --git a/project-fortis-pipeline/ops/charts/spark/templates/spark-master-service.yaml b/project-fortis-pipeline/ops/charts/spark/templates/spark-master-service.yaml
new file mode 100644
index 0000000..11c48a4
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/spark/templates/spark-master-service.yaml
@@ -0,0 +1,20 @@
+apiVersion: v1
+kind: Service
+metadata:
+ name: {{.Values.Master.Component}}
+ labels:
+ heritage: {{.Release.Service | quote }}
+ release: {{.Release.Name | quote }}
+ chart: "{{.Chart.Name}}-{{.Chart.Version}}"
+ component: "{{.Values.Master.Component}}"
+spec:
+ ports:
+ - port: {{.Values.Master.ServicePort}}
+ targetPort: {{.Values.Master.ContainerPort}}
+ name: spark
+ - port: {{.Values.WebUi.ServicePort}}
+ targetPort: {{.Values.WebUi.ContainerPort}}
+ name: http
+ selector:
+ component: "{{.Values.Master.Component}}"
+ type: "LoadBalancer"
\ No newline at end of file
diff --git a/project-fortis-pipeline/ops/charts/spark/templates/spark-worker-statefulset.yaml b/project-fortis-pipeline/ops/charts/spark/templates/spark-worker-statefulset.yaml
new file mode 100644
index 0000000..ebc9715
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/spark/templates/spark-worker-statefulset.yaml
@@ -0,0 +1,92 @@
+apiVersion: "apps/v1beta1"
+kind: StatefulSet
+metadata:
+ name: {{ template "worker-fullname" . }}
+ labels:
+ heritage: {{.Release.Service | quote }}
+ release: {{.Release.Name | quote }}
+ chart: "{{.Chart.Name}}-{{.Chart.Version}}"
+ component: "{{.Release.Name}}-{{.Values.Worker.Component}}"
+spec:
+ serviceName: {{ template "worker-fullname" . }}
+ replicas: {{default 1 .Values.Worker.Replicas}}
+ selector:
+ matchLabels:
+ component: "{{.Release.Name}}-{{.Values.Worker.Component}}"
+ template:
+ metadata:
+ labels:
+ app: {{ template "worker-fullname" . }}
+ heritage: {{.Release.Service | quote }}
+ release: {{.Release.Name | quote }}
+ chart: "{{.Chart.Name}}-{{.Chart.Version}}"
+ component: "{{.Release.Name}}-{{.Values.Worker.Component}}"
+ spec:
+ affinity:
+ nodeAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ nodeSelectorTerms:
+ - matchExpressions:
+ - key: "beta.kubernetes.io/instance-type"
+ operator: In
+ values: ["{{.Values.Worker.VmInstanceType}}"]
+ podAntiAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ - labelSelector:
+ matchExpressions:
+ - key: app
+ operator: In
+ values: [{{ template "worker-fullname" . }}]
+ topologyKey: kubernetes.io/hostname
+ containers:
+ - name: {{ template "worker-fullname" . }}
+ image: "{{.Values.Worker.Image}}:{{.Values.Worker.ImageTag}}"
+ imagePullPolicy: "{{.Values.Worker.ImagePullPolicy}}"
+ command: ["/start-worker"]
+ {{- if .Values.Worker.ConfigMapName }}
+ envFrom:
+ - configMapRef:
+ name: "{{.Values.Worker.ConfigMapName}}"
+ {{ end }}
+ # command: ["/opt/spark/bin/spark-class", "org.apache.spark.deploy.worker.Worker", "spark://{{ template "master-fullname" . }}:{{.Values.Master.ServicePort}}", "--webui-port", "{{.Values.Worker.ContainerPort}}", "--work-dir", "{{.Values.Worker.WorkingDirectory}}"]
+ ports:
+ - containerPort: {{.Values.Worker.ContainerPort}}
+ resources:
+ requests:
+ cpu: "{{.Values.Worker.Resources.Requests.Cpu}}"
+ memory: "{{.Values.Worker.Resources.Requests.Memory}}"
+ limits:
+ cpu: "{{.Values.Worker.Resources.Limits.Cpu}}"
+ memory: "{{.Values.Worker.Resources.Limits.Memory}}"
+ {{- if .Values.Persistence.CheckpointShare }}
+ volumeMounts:
+ - name: worker-data
+ mountPath: {{ .Values.Worker.WorkingDirectory | quote }}
+ - name: checkpointfile
+ mountPath: {{ .Values.Persistence.CheckpointDirectory | quote }}
+ {{ else }}
+ volumeMounts:
+ - name: worker-data
+ mountPath: {{ .Values.Worker.WorkingDirectory | quote }}
+ {{- end }}
+ env:
+{{- if .Values.Worker.Environment }}
+{{ toYaml .Values.Worker.Environment | indent 12 }}
+{{- end }}
+ {{- if .Values.Worker.WorkingDirectory }}
+ - name: SPARK_WORKER_DIR
+ value: {{ .Values.Worker.WorkingDirectory | quote }}
+ {{- end -}}
+ # These are converted to volume claims by the controller
+ # and mounted at the paths mentioned above.
+ volumes:
+ - name: worker-data
+ hostPath:
+ path: "/mnt/workdir"
+ {{- if .Values.Persistence.CheckpointShare }}
+ - name: checkpointfile
+ azureFile:
+ secretName: checkpointing-pvc-secret
+ shareName: {{ .Values.Persistence.CheckpointShare | quote }}
+ readOnly: false
+ {{- end }}
\ No newline at end of file
diff --git a/project-fortis-pipeline/ops/charts/spark/templates/spark-zeppelin-deployment.yaml b/project-fortis-pipeline/ops/charts/spark/templates/spark-zeppelin-deployment.yaml
new file mode 100644
index 0000000..456d730
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/spark/templates/spark-zeppelin-deployment.yaml
@@ -0,0 +1,33 @@
+apiVersion: v1
+kind: Service
+metadata:
+ name: zeppelin
+spec:
+ ports:
+ - port: {{.Values.Zeppelin.ServicePort}}
+ targetPort: {{.Values.Zeppelin.ContainerPort}}
+ selector:
+ component: zeppelin
+ type: "LoadBalancer"
+---
+apiVersion: v1
+kind: ReplicationController
+metadata:
+ name: zeppelin-controller
+spec:
+ replicas: {{default 1 .Values.Zeppelin.Replicas}}
+ selector:
+ component: zeppelin
+ template:
+ metadata:
+ labels:
+ component: zeppelin
+ spec:
+ containers:
+ - name: zeppelin
+ image: "{{.Values.Zeppelin.Image}}:{{.Values.Zeppelin.ImageTag}}"
+ ports:
+ - containerPort: {{.Values.Zeppelin.ContainerPort}}
+ resources:
+ requests:
+ cpu: "{{.Values.Zeppelin.Cpu}}"
diff --git a/project-fortis-pipeline/ops/charts/spark/values.yaml b/project-fortis-pipeline/ops/charts/spark/values.yaml
new file mode 100644
index 0000000..dbfdeb7
--- /dev/null
+++ b/project-fortis-pipeline/ops/charts/spark/values.yaml
@@ -0,0 +1,71 @@
+# Default values for spark.
+# This is a YAML-formatted file.
+# Declare name/value pairs to be passed into your templates.
+# name: value
+
+Master:
+ Name: master
+ Image: "erikschlegel/spark-master"
+ ImageTag: "2.2"
+ Component: "spark-master"
+ ImagePullPolicy: "Always"
+ ServicePort: 7077
+ ContainerPort: 7077
+ #SparkSubmitCommand: ["spark-submit", "--master local[2]", "--driver-memory 4g", "enter-your-fat.jar"]
+ #ConfigMapName: spark-master-conf
+ Resources:
+ Requests:
+ Cpu: "700m"
+ Memory: "3Gi"
+ Limits:
+ Cpu: "700m"
+ Memory: "3Gi"
+ # Set Master JVM memory. Default 1g
+ DaemonMemory: 1g
+
+WebUi:
+ Name: webui
+ ServicePort: 8080
+ Component: "spark-webui"
+ ProxyPort: 80
+ ContainerPort: 8080
+ Image: "elsonrodriguez/spark-ui-proxy:1.0"
+
+Worker:
+ Name: worker
+ Image: "erikschlegel/spark-worker"
+ ImageTag: "2.2"
+ ImagePullPolicy: "Always"
+ VmInstanceType: "Standard_L4s"
+ Replicas: 6
+ Component: "spark-worker"
+ WorkingDirectory: "/opt/spark/work"
+ ContainerPort: 8081
+ #ConfigMapName: spark-master-conf
+ Resources:
+ Requests:
+ Cpu: "700m"
+ Memory: "3Gi"
+ Limits:
+ Cpu: "700m"
+ Memory: "3Gi"
+ Environment:
+ - name: SPARK_DAEMON_MEMORY
+ value: 1g
+ - name: SPARK_WORKER_MEMORY
+ value: 1g
+
+Zeppelin:
+ Name: zeppelin
+ Image: "srfrnk/zeppelin"
+ ImageTag: "0.7.0"
+ Component: "zeppelin"
+ Cpu: "100m"
+ ServicePort: 8080
+ ContainerPort: 8080
+
+Persistence:
+ # PvcAcctName: Secret
+ # PvcPwd: Secret
+ CheckpointDirectory: "/opt/checkpoint"
+ #CheckpointShare: "checkpoint"
\ No newline at end of file
diff --git a/project-fortis-pipeline/ops/install-cassandra.sh b/project-fortis-pipeline/ops/install-cassandra.sh
index 3c281a8..00fbcc7 100755
--- a/project-fortis-pipeline/ops/install-cassandra.sh
+++ b/project-fortis-pipeline/ops/install-cassandra.sh
@@ -4,7 +4,6 @@ readonly k8cassandra_node_count="$1"
readonly agent_vm_size="$2"
# setup
-if [ ! -d charts ]; then git clone --depth=1 https://github.com/erikschlegel/charts.git -b spark-localssd; fi
cd charts || exit -2
readonly cluster_name="FORTIS_CASSANDRA"
readonly storageClass="fast"
@@ -15,8 +14,9 @@ helm install \
--set VmInstanceType="${agent_vm_size}" \
--set cassandra.ClusterName="${cluster_name}" \
--set persistence.storageClass="${storageClass}" \
- --name cassandra-cluster ./incubator/cassandra \
- --namespace cassandra
+ --namespace cassandra \
+ --name cassandra-cluster \
+ ./cassandra
# cleanup
cd ..
diff --git a/project-fortis-pipeline/ops/install-spark.sh b/project-fortis-pipeline/ops/install-spark.sh
index 5c4e6b5..b46ec8e 100755
--- a/project-fortis-pipeline/ops/install-spark.sh
+++ b/project-fortis-pipeline/ops/install-spark.sh
@@ -22,7 +22,6 @@ readonly agent_vm_size="${19}"
# setup
if ! (command -v jq >/dev/null); then sudo apt-get -qq install -y jq; fi
-if [ ! -d charts ]; then git clone --depth=1 https://github.com/erikschlegel/charts.git -b spark-localssd; fi
cd charts || exit -2
readonly spark_daemon_memory="1g"
readonly default_language="en"
@@ -85,7 +84,7 @@ helm install \
--set Worker.Environment[0].name="SPARK_WORKER_MEMORY",Worker.Environment[0].value="20g" \
--namespace spark \
--name spark-cluster \
- ./stable/spark
+ ./spark
# cleanup
cd ..
Currently we're using Erik's fork when fetching the Helm templates: https://github.com/CatalystCode/project-fortis-pipeline/blob/4c655daa042e6b784a14b857f7618b024b93082b/ops/create-cluster.sh#L19
We should update this to the upstream CatalystCode repository which is more actively maintained.