Closed zelima closed 4 years ago
This is what I see from kubectl describe
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 42m (x28 over 6d3h) kubelet, aks-default-18083181-1 Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "argument list too long": unknown
Warning Unhealthy 25m (x2 over 3d) kubelet, aks-default-18083181-1 Liveness probe failed: /usr/bin/zkOk.sh: line 21: /bin/nc: Cannot allocate memory
Warning Unhealthy 25m (x199 over 6d3h) kubelet, aks-default-18083181-1 Readiness probe failed:
Warning Unhealthy 54s (x460 over 6d4h) kubelet, aks-default-18083181-1 Liveness probe failed:
Warning BackOff 52s (x7574 over 6d4h) kubelet, aks-default-18083181-1 Back-off restarting failed container
Seems like zookeeper has very low memory allocated.
@akariv I see resources for SOLR pods are updated in this commit https://github.com/datopian/ckan-cloud-operator/commit/22423feec2806ce6a0465c16e43ed5356cf32348#diff-167428a1f2e6a73b2e24fcc3c27fbd97L275
What do you think if I make them configurable (with the current defaults) Eg in interactive mode? Right now resources for both zookeeper and solrcloud are hardcoded to 200mi and 8GB.
Something like
diff --git a/interactive.yaml b/interactive.yaml
index abec8fc..381200e 100644
--- a/interactive.yaml
+++ b/interactive.yaml
@@ -12,9 +12,11 @@ default:
self-hosted: y
num-shards: "1"
replication-factor: "1"
+ sc-resources: '{"limits":{"memory":"1Gi"}, "requested": {"memory":"1Gi"}}'
+ zk-resources: '{"limits":{"memory":"1Gi", "cpu":"1"}, "requested": {"memory":"1Gi", "cpu":"0.5"}}'
ckan-storage-config:
default-storage-bucket: ckan
I hope this is indeed the problem that's causing this, but we can try. I see no harm in making these configurable.
Solr pods keep restarting from time to time leading problems in deployment. Eg ckan instance will fail to connect to solr if one of them is in CrashLoop. For 26 h they've restarted >200 time some of them
tasks