boozallen / aissemble

Booz Allen's lean manufacturing approach for holistically designing, developing and fielding AI solutions across the engineering lifecycle from data processing to model building, tuning, and training to secure operational deployment
Other
29 stars 7 forks source link

As a Data Engineer, I want to use Spark 3.5 so I can leverage the latest enhancements and fixes. #55

Open ewilkins-csi opened 2 months ago

ewilkins-csi commented 2 months ago

Background

We are currently on Spark version 3.4.0. There were some security patches released in 3.4.3 that we want to pull in, but since the upgrade to 3.5 seems simple enough we can go ahead and make the jump to the very latest version: 3.5.1.

Definition of Done

BDD Scenarios Baton migration:

Scenario: My data pipelines are migrated to Spark 3.5.1
  Given a file using the Spark config property spark.yarn.executor.failuresValidityInterval and spark.yarn.max.executor.failures
  When the aissemble 1.7 Spark migration is executed
  Then the file references are updated to spark.executor.failuresValidityInterval and spark.executor.maxNumFailures respectively

Test Steps

TBD - Stand up two simple spark/pyspark data pipelines that read in some data to Spark, do some small transform, and write the data back out.

ewilkins-csi commented 2 months ago

DoD completed with @Cho-William and @carter-cundiff