JahstreetOrg / spark-on-kubernetes-helm

Spark on Kubernetes infrastructure Helm charts repo
Apache License 2.0
199 stars 76 forks source link

Liveness & rediness checks timeout on Livy #34

Closed pdambrauskas closed 3 years ago

pdambrauskas commented 4 years ago

Currently liveness is being checked on /batches endpoint: https://github.com/jahstreet/spark-on-kubernetes-helm/blob/a1fd2ac19580feb0d9469c1d7cadd8630710ac13/charts/livy/templates/statefulset.yaml#L33

When there is a bigger number of batches, these check timeout occasionally:

Events:
  Type     Reason     Age                 From                                                    Message
  ----     ------     ----                ----                                                    -------
  Warning  Unhealthy  54m (x56 over 10d)  kubelet, ip-XX  Readiness probe failed: Get http://XX:8998/batches: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  54m (x59 over 10d)  kubelet, ip-XX  Liveness probe failed: Get http://XX:8998/batches: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Would it be ok to add ?size=1 to limit response size, or at least to have an option to disable these checks on livy chart?

jahstreet commented 4 years ago

Good point, thanks for the mentioning. Will update the chart.

jahstreet commented 4 years ago

Will be fixed in #39 . Proposed solution is to call /version endpoint instead.