Azure / azure-databricks-operator

Kubernetes Operator for Databricks
MIT License
113 stars 48 forks source link

Run objects in terminal states are still requeued for reconciliation #148

Closed stuartleeks closed 4 years ago

stuartleeks commented 4 years ago

During load testing we found that Run objects are still requeued for reconciliation even when they have reached a terminal state.

The API docs state that once a Run has reached a terminal state the result will not change.

This repeated reconciliation has an impact on the operator performance.

The following graph shows the load pattern that we used for the load test:

image

And the graph below shows the reconciliation rate for the same time period:

image

The first point to note is that when the test load drops off there is still a high sustained reconciliation rate. The second point to note is that the majority of the reconciliations throughout the test are returning a requeue_after result which fits with the hypothesis that the load is due to the requeues to perform a refresh.

stuartleeks commented 4 years ago

@Azadehkhojandi - fyi we should have PR to address this today (UK time) :-)