apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.07k stars 902 forks source link

[Umbrella] Application Management Service Layer for backend engine/application management #2444

Open yaooqinn opened 2 years ago

yaooqinn commented 2 years ago

Code of Conduct

Search before asking

Describe the proposal

Motivation

Add a layer for better application management, such as terminating, getting background application status, tagging, etc.

The application manager itself tries its best to be engine-and-cluster-manager-independent, such as Yarn, k8s, spark or flink.

the application manager supports registering operations against engines or cluster managers with generalized APIs, such as kill and get-application -nfo. An operation is cluster manager specific, which supports load by SPI or Kyuubi config.

Goals

Non-goals

Overall Architecture

Application Operation Interface

trait ApplicationOperation {
  /**
   * Step for initializing the instance.
   */
  def initialize(conf: KyuubiConf): Unit

  /**
   * Step to clean up the instance
   */
  def stop(): Unit

  /**
   * Called before other methods to do a quick skip
   *
   * @param clusterManager the underlying cluster manager or just local instance
   */
  def isSupported(clusterManager: Option[String]): Boolean

  /**
   * Kill the app/engine with the unique application tag
   *
   * @param tag the unique application tag for engine instance.
   *            For example,
   *            if the Hadoop Yarn is used, for spark applications,
   *            the tag will be preset via spark.yarn.tags
   * @return a message contains a response describing how the killing process.
   *
   * @note For implementations, please suppress exceptions and always return KillResponse
   */
  def killApplicationByTag(tag: String): KillResponse

  /**
   * Get the engine/application status by the unique application tag
   *
   * @param tag the unique application tag for engine instance.
   * @return a map contains the application status
   */
  def getApplicationInfoByTag(tag: String): Map[String, String]
}

object ApplicationOperation {

  /**
   * identifier determined by cluster manager for the engine
   */
  val APP_ID_KEY = "id"
  val APP_NAME_KEY = "name"
  val APP_STATE_KEY = "state"
  val APP_URL_KEY = "url"
  val APP_ERROR_KEY = "error"

  val NOT_FOUND = "APPLICATION_NOT_FOUND"
}

Use cases

Task list

todo

Are you willing to submit PR?

yaooqinn commented 2 years ago

cc @turboFei

I also want the change the batch result set's schema to a simple key-value pair

something like

key value
id applicaiton_123455
name my appname
state ACCEPTED RUNNING FINISHED KILLED etc
url web URL link
error some error if occured
pan3793 commented 1 year ago

Part(spark) of this umbrella has been done, postpone to the next release to accomplish