AbsaOSS / spot

Aggregate and analyze Spark history, export to elasticsearch, visualize and monitor with Kibana.
Apache License 2.0
5 stars 0 forks source link

Investigate missing runs #63

Closed DzMakatun closed 2 years ago

DzMakatun commented 3 years ago

Some Spark runs are missing from Spot Elasticsearch indexes. It can be seen as gaps on the timeline of completed jobs.

This is likely happening when History server is catching up with previously completed runs (e.g. after a restart). In such scenario, new runs may appear in history in the order which is different from their completion order. However, Spot parses the jobs in their completion order: it receives a list of completed jobs within a time interval, the lower limit of the interval is updated in each iteration to match the max completion time of the previously processed jobs. As a result, if a job appears in Spark history later than some jobs which were completed after if will be skipped by Spot.

An investigation is required to confirm the issue and find possible solutions.

Screen Shot 2021-06-07 at 12 06 59