Closed pdambrauskas closed 4 years ago
Currently liveness is being checked on /batches endpoint: https://github.com/jahstreet/spark-on-kubernetes-helm/blob/a1fd2ac19580feb0d9469c1d7cadd8630710ac13/charts/livy/templates/statefulset.yaml#L33
/batches
When there is a bigger number of batches, these check timeout occasionally:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 54m (x56 over 10d) kubelet, ip-XX Readiness probe failed: Get http://XX:8998/batches: net/http: request canceled (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 54m (x59 over 10d) kubelet, ip-XX Liveness probe failed: Get http://XX:8998/batches: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Would it be ok to add ?size=1 to limit response size, or at least to have an option to disable these checks on livy chart?
?size=1
Good point, thanks for the mentioning. Will update the chart.
Will be fixed in #39 . Proposed solution is to call /version endpoint instead.
/version
Currently liveness is being checked on
/batches
endpoint: https://github.com/jahstreet/spark-on-kubernetes-helm/blob/a1fd2ac19580feb0d9469c1d7cadd8630710ac13/charts/livy/templates/statefulset.yaml#L33When there is a bigger number of batches, these check timeout occasionally:
Would it be ok to add
?size=1
to limit response size, or at least to have an option to disable these checks on livy chart?