MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

Improve Job status reporting #237

Closed ghukill closed 6 years ago

ghukill commented 6 years ago

Currently, users only know if a Job is waiting, running, or complete. But the Spark API provides some additional information that could be used to inform users of what is happening.

Sample output from API for an entire Combine Job, which is grouped in Spark as a jobGroup, each with individual jobs:

[
   [
      {
         "jobId":11,
         "name":"saveAsNewAPIHadoopFile at PythonRDD.scala:834",
         "description":"Job group for statement 1",
         "submissionTime":"2018-07-08T18:53:23.119GMT",
         "completionTime":"2018-07-08T18:55:00.146GMT",
         "stageIds":[
            16
         ],
         "jobGroup":"1",
         "status":"SUCCEEDED",
         "numTasks":8,
         "numActiveTasks":0,
         "numCompletedTasks":8,
         "numSkippedTasks":0,
         "numFailedTasks":0,
         "numActiveStages":0,
         "numCompletedStages":1,
         "numSkippedStages":0,
         "numFailedStages":0
      },
      {
         "jobId":10,
         "name":"take at SerDeUtil.scala:233",
         "description":"Job group for statement 1",
         "submissionTime":"2018-07-08T18:53:22.727GMT",
         "completionTime":"2018-07-08T18:53:23.069GMT",
         "stageIds":[
            15
         ],
         "jobGroup":"1",
         "status":"SUCCEEDED",
         "numTasks":1,
         "numActiveTasks":0,
         "numCompletedTasks":1,
         "numSkippedTasks":0,
         "numFailedTasks":0,
         "numActiveStages":0,
         "numCompletedStages":1,
         "numSkippedStages":0,
         "numFailedStages":0
      },
      {
         "jobId":9,
         "name":"runJob at PythonRDD.scala:441",
         "description":"Job group for statement 1",
         "submissionTime":"2018-07-08T18:53:01.977GMT",
         "completionTime":"2018-07-08T18:53:21.716GMT",
         "stageIds":[
            14
         ],
         "jobGroup":"1",
         "status":"SUCCEEDED",
         "numTasks":3,
         "numActiveTasks":0,
         "numCompletedTasks":3,
         "numSkippedTasks":0,
         "numFailedTasks":0,
         "numActiveStages":0,
         "numCompletedStages":1,
         "numSkippedStages":0,
         "numFailedStages":0
      },
      {
         "jobId":8,
         "name":"runJob at PythonRDD.scala:441",
         "description":"Job group for statement 1",
         "submissionTime":"2018-07-08T18:52:40.314GMT",
         "completionTime":"2018-07-08T18:53:01.964GMT",
         "stageIds":[
            13
         ],
         "jobGroup":"1",
         "status":"SUCCEEDED",
         "numTasks":4,
         "numActiveTasks":0,
         "numCompletedTasks":4,
         "numSkippedTasks":0,
         "numFailedTasks":0,
         "numActiveStages":0,
         "numCompletedStages":1,
         "numSkippedStages":0,
         "numFailedStages":0
      },
      {
         "jobId":7,
         "name":"runJob at PythonRDD.scala:441",
         "description":"Job group for statement 1",
         "submissionTime":"2018-07-08T18:52:20.329GMT",
         "completionTime":"2018-07-08T18:52:40.289GMT",
         "stageIds":[
            12
         ],
         "jobGroup":"1",
         "status":"SUCCEEDED",
         "numTasks":1,
         "numActiveTasks":0,
         "numCompletedTasks":1,
         "numSkippedTasks":0,
         "numFailedTasks":0,
         "numActiveStages":0,
         "numCompletedStages":1,
         "numSkippedStages":0,
         "numFailedStages":0
      },
      {
         "jobId":6,
         "name":"jdbc at NativeMethodAccessorImpl.java:0",
         "description":"Job group for statement 1",
         "submissionTime":"2018-07-08T18:50:15.220GMT",
         "completionTime":"2018-07-08T18:52:19.531GMT",
         "stageIds":[
            9,
            10,
            11
         ],
         "jobGroup":"1",
         "status":"SUCCEEDED",
         "numTasks":218,
         "numActiveTasks":0,
         "numCompletedTasks":218,
         "numSkippedTasks":0,
         "numFailedTasks":0,
         "numActiveStages":0,
         "numCompletedStages":3,
         "numSkippedStages":0,
         "numFailedStages":0
      }
   ]
]

Because Spark evaluates lazily, it's hard to estimate how long a Job will take, or even how many stages. But it's possible to name the various Spark jobs, and provide stages / completed for the user, to at least see that the Job is running.

ghukill commented 6 years ago

When viewing a Combine Job that is running, perhaps create a tab showing these stages in detail, while Jobs that are complete, default to the Records tab?

ghukill commented 6 years ago

More on setting jobGroup information: https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/SparkContext.html#setJobGroup(java.lang.String,%20java.lang.String,%20boolean)

ghukill commented 6 years ago

Completed as "Spark Details" tab in Job Details