AbsaOSS / spot

Aggregate and analyze Spark history, export to elasticsearch, visualize and monitor with Kibana.
Apache License 2.0
5 stars 0 forks source link

Process Enceladus Checkpoints #36

Closed DzMakatun closed 3 years ago

DzMakatun commented 3 years ago

Currently all the checkpoints are dropped from Enceladus run object. the checkpoints are stored as an array of json objects: "checkpoints": [ { "name": "Source", "processStartTime": "9-10-2020 11:32:36", "processEndTime": "9-10-2020 11:32:36", "workflowName": "Source", "order": 1, "controls": [ { "controlName": "recordcount", "controlType": "controlType.count", "controlCol": "", "controlValue": "14" } ] }, { "name": "Raw", "processStartTime": "9-10-2020 11:32:36", "processEndTime": "9-10-2020 11:32:36", "workflowName": "Raw", "order": 2, "controls": [ { "controlName": "recordcount", "controlType": "controlType.count", "controlCol": "", "controlValue": "14" } ] }, { "name": "Standardization - End", "software": "Atum", "version": "0.2.6", "processStartTime": "09-10-2020 22:51:48 +0000", "processEndTime": "09-10-2020 22:51:57 +0000", "workflowName": "Standardization", "order": 3, "controls": [ { "controlName": "recordcount", "controlType": "controlType.count", "controlCol": "", "controlValue": "14" } ] }, { "name": "Conformance - Start", "software": "Atum", "version": "0.2.6", "processStartTime": "09-10-2020 22:53:01 +0000", "processEndTime": "09-10-2020 22:53:19 +0000", "workflowName": "Conformance", "order": 4, "controls": [ { "controlName": "recordcount", "controlType": "controlType.count", "controlCol": "", "controlValue": "14" } ] }, { "name": "Conformance - End", "software": "Atum", "version": "0.2.6", "processStartTime": "09-10-2020 22:53:19 +0000", "processEndTime": "09-10-2020 22:53:20 +0000", "workflowName": "Conformance", "order": 5, "controls": [ { "controlName": "recordcount", "controlType": "controlType.count", "controlCol": "*", "controlValue": "14" } ] } ]

For better monitoring the following processing of checkpoints it is suggested to process the raw checkoints in aggregations in the following way:

DzMakatun commented 3 years ago

Adding more aggregations may result in exceeding the default Elasticsearch limit on fields

Limit of total fields [1000] in index [spot_agg_2] has been exceeded

to solve this issue the limit can be increased. Use appropriate index name: PUT /spot_agg_2/_settings { "index.mapping.total_fields.limit": 2000 }