As a Data Engineer, I want to use Spark 3.5 so I can leverage the latest enhancements and fixes.

Background

We are currently on Spark version 3.4.0. There were some security patches released in 3.4.3 that we want to pull in, but since the upgrade to 3.5 seems simple enough we can go ahead and make the jump to the very latest version: 3.5.1.

Definition of Done

Upgrade the baseline to Spark 3.5.1
Write baton migration to help projects migrate to Spark 3.5
- Replace spark.yarn.executor.failuresValidityInterval with spark.executor.failuresValidityInterval across all files
- Replace spark.yarn.max.executor.failures with spark.executor.maxNumFailures across all files
- Note spark operator's SparkApplication CRD does not have any specific YAML properties around these configs
Update release notes
- Note the upgrade to Spark 3.5
- Add migration(s) to table
- Add patched CVEs

BDD Scenarios Baton migration:

Scenario: My data pipelines are migrated to Spark 3.5.1
  Given a file using the Spark config property spark.yarn.executor.failuresValidityInterval and spark.yarn.max.executor.failures
  When the aissemble 1.7 Spark migration is executed
  Then the file references are updated to spark.executor.failuresValidityInterval and spark.executor.maxNumFailures respectively

Test Steps

TBD - Stand up two simple spark/pyspark data pipelines that read in some data to Spark, do some small transform, and write the data back out.

boozallen / aissemble

As a Data Engineer, I want to use Spark 3.5 so I can leverage the latest enhancements and fixes. #55

Background

Definition of Done

Test Steps