Definition of Done

[x] Add an alternate values file within the SparkApplication base chart which supports:
- [x] The SparkApplication dev template shall be updated to default to mounting the Spark History event volume.
- [x] The SparkApplication dev template shall default to writing spark event logs to the Spark History event volume.
[x] Create the Spark Ecosystem V2 chart template for downstream projects
[x] Ensure that the Spark Ecosystem V2 chart for downstream projects supports the new Spark History V2 chart.

Test Steps

Generate a new project with the following command:

mvn archetype:generate -B -DarchetypeGroupId=com.boozallen.aissemble \
                      -DarchetypeArtifactId=foundation-archetype \
                      -DarchetypeVersion=1.7.0-SNAPSHOT \
                      -DartifactId=test-project\
                      -DgroupId=org.test \
                      -DprojectName='Test' \
                      -DprojectGitUrl=test.org/test-project\
&& cd test-project

Add the following pipeline definitions to test-project-pipeline-models/src/main/resources/pipelines/:

{
"name": "SimplePipeline",
"package": "org.test",
"type": {
    "name": "data-flow",
    "implementation": "data-delivery-spark"
},
"steps": [
    {
        "name": "Ingest",
        "type": "synchronous",
        "alerting": {
            "enabled": false
        },
        "dataProfiling": {
            "enabled": false
        },
        "provenance": {
            "enabled": false
        }
    }
]
}

{
"name": "SimplePipelinePython",
"package": "org.test",
"type": {
    "name": "data-flow",
    "implementation": "data-delivery-pyspark"
},
"steps": [
    {
        "name": "Ingest",
        "type": "synchronous",
        "alerting": {
            "enabled": false
        },
        "dataProfiling": {
            "enabled": false
        },
        "provenance": {
            "enabled": false
        }
    }
]
}

Execute mvn clean install repeatedly, resolving all manual actions until none remain.
Modify the contents of test-project-deploy/pom.xml, replacing <profile>aissemble-spark-infrastructure-deploy</profile> with <profile>aissemble-spark-infrastructure-deploy-v2</profile>
Delete the directory test-project-deploy/src/main/resources/apps/spark-infrastructure along with its contents.
Execute mvn clean install
OTS ONLY: Replace the repository in the generated Chart.yaml with the absolute path to the spark-history chart in the local aissemble repository, and re-execute mvn clean install

Use kubectl apply -f to apply the following yaml:

apiVersion: v1
kind: ConfigMap
metadata:
name: spark-config
data: {}

Update the contents of test-project-pipelines/simple-pipeline/src/main/resources/apps/simple-pipeline-base-values.yaml and test-project-pipelines/simple-pipeline-python/src/simple_pipeline_python/resources/apps/simple-pipeline-python-base-values.yaml to remove references to hadoop-aws and aws-java-sdk-bundle
Update the contents of test-project-pipelines/simple-pipeline/src/main/resources/apps/simple-pipeline-dev-values.yaml and test-project-pipelines/simple-pipeline-python/src/simple_pipeline_python/resources/apps/simple-pipeline-python-dev-values.yaml to remove their spark.eventLog configurations.

Save the following content at the root of test-project with the name values-migrate-dev.yaml:

########################################
## CONFIG | Spark Configs
########################################
metadata:
namespace: default
sparkApp:
spec:
sparkConf:
  spark.eventLog.enabled: "true"
  spark.eventLog.dir: "/opt/spark/spark-events"
type: "placeholder" #required for a dry run test to pass, this should always be overridden
mode: cluster
imagePullPolicy: IfNotPresent
restartPolicy:
  type: Never
sparkVersion: "3.4.0"
sparkConfigMap: spark-config
dynamicAllocation:
  enabled: true
  initialExecutors: 0
  minExecutors: 0
  maxExecutors: 4
volumes:
  - name: ivy-cache
    persistentVolumeClaim:
      claimName: ivy-cache
  - name: spark-events
    persistentVolumeClaim:
      claimName: spark-events-claim
driver:
  cores: 1
  coreLimit: "1200m"
  memory: "512m"
  serviceAccount: spark
  volumeMounts:
    - name: ivy-cache
      mountPath: "/opt/spark/.ivy2"
    - name: spark-events
      mountPath: "/opt/spark/spark-events"
executor:
  cores: 1
  coreLimit: "1200m"
  memory: "512m"
  labels:
    version: 3.4.0
  volumeMounts:
    - name: ivy-cache
      mountPath: "/opt/spark/.ivy2"
    - name: spark-events
      mountPath: "/opt/spark/spark-events"
service:
enabled: false
spec:
ports:
  - name: "debug"
    port: 4747
    targetPort: 4747

Search for spark-application in your TiltFile, adding --values values-migrate-dev.yaml after --version %s in both cases.
Execute mvn clean install -Dmaven.build.cache.skipCache=true, resolving any lingering manual actions
Execute tilt up and wait for all resources to be ready
Trigger execution of the simple-pipeline resource, ensuring it completes successfully
Trigger execution of the simple-pipeline-python resource, ensuring it completes successfully
Navigate to localhost:18080
Ensure that spark events are visible in the spark-history UI from both simple-pipeline and simple-pipeline-python

boozallen / aissemble

As a downstream consumer of aiSSEMBLE, I want my SparkApplications to interface with the v2 SparkHistory chart by default. #95

Definition of Done

Test Steps