azkaban / azkaban-plugins

Plugins for Azkaban.
https://azkaban.github.io
Apache License 2.0
130 stars 178 forks source link

Decouple Spark version upgrade from Azkaban Spark job type config deployment #261

Closed jakhani closed 7 years ago

jakhani commented 7 years ago

There should be a way to add new Spark binary without dependency on providing Spark home for that binary. This change covers a way to follow directory to keep all spark binaries and naming pattern using which we can easily add new binary and don't need to add property for it's path.

  1. If spark version is not provided by User as a job property, then choose default spark home.
  2. If spark version is provided by User as a job property then a) If spark.{sparkVersion}.home is set in commonprivate.properties/private.properties, then that will be spark home. b) If spark.{sparkVersion}.home is not set and spark.home.dir is set then it will retrieve Spark directory inside spark.home.dir for sparkVersion. c) If spark.{sparkVersion}.home is not set and spark.home.dir is set but there is no spark home available inside spark.home.dir for passed sparkVersion then it will throw an exception with available versions as message.
zhe-thoughts commented 7 years ago

How should we handle the case where user provided Spark version is not installed? Should we throw certain exceptions?

Victsm commented 7 years ago

The existing mechanism will fall back to using the default version. Throwing an exception might be better here to make sure expectations are met. It would be better if the exception message also includes valid version Strings so users know which ones are valid.

jakhani commented 7 years ago

@zhe-thoughts & @Victsm Yes throwing an exception would be better in this case. So if provided version is not installed then it won't go by default installation. Do we follow same case for other types of jobs? For e.g. Pig or Hive?

Victsm commented 7 years ago

Other job types right now do not use the version string approach, but rather require users to specify the full path of Pig/Hive installation.

jakhani commented 7 years ago

Ok then throwing an exception would work and won't cause any inconsistency. I will make that change.

Victsm commented 7 years ago

It would be even better if we can infer valid version Strings and include these in the exception message.