Depending on the parallelism, we are simultaneously running several joinPart jobs. Besides that, we also have bootstrap and the final join potentially running in parallel. In the future [CHIP-4] we aim to have non-uniform step sizes for each joinPart and parallelize the construction of bootstrap and the final join. In addition we are also planning to add caching to groupBy's.
This means that it is hard to make sense of the progress of the job - specially if the number of join parts is more than 5. We have seen Joins with joinParts all the way up to 135!
Goal
We aim to create an additional tab in the spark UI if possible - that can show the progress of the current job.
We have to indicate the progress of several components
Bootstrap
JoinParts
JoinPart Caches (in the future)
Final Join
Each of these parts can have different step days, have intricate dependencies, and can run in parallel etc.
There is a lot of detail in the proposed UI - but it is best described with a mock up. See below.
[ChatGPT][Un-tested] Feasibility of adding a custom spark UI based
Adding a custom tab to the Apache Spark UI involves extending the Spark UI with your own Scala or Java code. The process generally involves creating classes that extend the SparkUITab and WebUIPage classes provided by Spark, and then integrating these into the Spark UI. Here's a step-by-step example in Scala, assuming you're familiar with Spark's programming model and have a basic setup ready:
Create a Custom Page Class: This class will represent the content of your custom tab.
import org.apache.spark.ui.{WebUIPage, WebUI}
import scala.xml.Node
class MyCustomPage(parent: WebUI)
extends WebUIPage("myCustomPage") {
// Define how your page should render its content
override def render(request: HttpServletRequest): Seq[Node] = {
<html>
<body>
<div>Hello, this is my custom page!</div>
</body>
</html>
}
}
Create a Custom Tab Class: This class will add a new tab to the Spark UI, holding your custom page(s).
import org.apache.spark.ui.{SparkUITab, SparkUI}
class MyCustomTab(sparkUI: SparkUI)
extends SparkUITab(sparkUI, "myCustomTab") {
// Add your custom page to this tab
attachPage(new MyCustomPage(this))
// You can add more pages here
}
Integrate Your Custom Tab with Spark UI: To integrate your custom tab, you need to access the SparkUI instance. This might be done in your Spark application's driver program, a custom Spark listener, or through other extensions points provided by Spark. Here's a basic example to illustrate how it might look in a Spark listener:
Register Your Listener with Spark: You can register your custom listener through Spark's configuration by adding the following line to your spark-defaults.conf file or passing it as a configuration parameter:
Replace your.package with the actual package name where your MyCustomSparkListener class resides.
Note: This example assumes you have a basic understanding of Scala, Spark, and how to compile and include your custom code with your Spark application. Integrating custom UI components requires your code to be compiled into a JAR that should be included in your Spark application's classpath. Depending on your Spark deployment mode (e.g., standalone, YARN, Mesos), the way you include your custom JAR might vary.
Problem
Depending on the parallelism, we are simultaneously running several joinPart jobs. Besides that, we also have bootstrap and the final join potentially running in parallel. In the future [CHIP-4] we aim to have non-uniform step sizes for each joinPart and parallelize the construction of bootstrap and the final join. In addition we are also planning to add caching to groupBy's.
This means that it is hard to make sense of the progress of the job - specially if the number of join parts is more than 5. We have seen Joins with joinParts all the way up to 135!
Goal
We aim to create an additional tab in the spark UI if possible - that can show the progress of the current job.
We have to indicate the progress of several components
Each of these parts can have different step days, have intricate dependencies, and can run in parallel etc.
There is a lot of detail in the proposed UI - but it is best described with a mock up. See below.
[ChatGPT][Un-tested] Feasibility of adding a custom spark UI based
Adding a custom tab to the Apache Spark UI involves extending the Spark UI with your own Scala or Java code. The process generally involves creating classes that extend the
SparkUITab
andWebUIPage
classes provided by Spark, and then integrating these into the Spark UI. Here's a step-by-step example in Scala, assuming you're familiar with Spark's programming model and have a basic setup ready:SparkUI
instance. This might be done in your Spark application's driver program, a custom Spark listener, or through other extensions points provided by Spark. Here's a basic example to illustrate how it might look in a Spark listener:spark-defaults.conf
file or passing it as a configuration parameter:Replace
your.package
with the actual package name where yourMyCustomSparkListener
class resides.Note: This example assumes you have a basic understanding of Scala, Spark, and how to compile and include your custom code with your Spark application. Integrating custom UI components requires your code to be compiled into a JAR that should be included in your Spark application's classpath. Depending on your Spark deployment mode (e.g., standalone, YARN, Mesos), the way you include your custom JAR might vary.