Open anil1994 opened 1 year ago
Thanks for opening this issue. A team member should give feedback soon. In the meantime, feel free to check out the contributing guidelines.
all k8s object deleted after a while, after I run
helm upgrade
@anil1994 I am not sure if would just delete all objects after the upgrade command.
It is possible that you ran helm repo update
and now it installs newer version of chart.
However, it should re-create all the resource properly. Did it not?
Also, could you please share the override-values.yaml
with proper indentation?
override-values.yaml
global:
storageClass: gp2
cloud: aws
clickhouse:
enable: true
installCustomStorageClass: true
k8s-infra:
enabled: false
now, I tried helm repo update, and then, run helm upgrade command, I waiting now for looking whether helm objects is deleted or not
again I encountered same issue, I looked event by running kubectl get events command. Any idea? @prashant-shahi
81s Warning Unhealthy pod/chi-signoz-clickhouse-cluster-0-0-0 Readiness probe failed: Get "http://10.1.27.206:8123/ping": dial tcp 10.1.27.206:8123: connect: connection refused
78s Warning FailedKillPod pod/chi-signoz-clickhouse-cluster-0-0-0 error killing pod: failed to "KillContainer" for "clickhouse" with KillContainerError: "rpc error: code = Unknown desc = Error response from daemon: No such container: 2ee872bb111ffb478c7de74cd5a98727a499e8445f266decfcb19013d2fd72a6"
85s Normal SuccessfulDelete statefulset/chi-signoz-clickhouse-cluster-0-0 delete Pod chi-signoz-clickhouse-cluster-0-0-0 in StatefulSet chi-signoz-clickhouse-cluster-0-0 successful
86s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete CHI started
86s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete shard platform/0 - started
85s Info DeleteCompleted clickhouseinstallation/signoz-clickhouse Deleted tables on host 0-0 replica 0 to shard 0 in cluster cluster
84s Normal Killing pod/signoz-alertmanager-0 Stopping container signoz-alertmanager
84s Warning FailedKillPod pod/signoz-alertmanager-0 error killing pod: failed to "KillContainer" for "signoz-alertmanager" with KillContainerError: "rpc error: code = Unknown desc = Error response from daemon: No such container: e17eea8d1992521e7591085a00581e40a837b3101326f183902192a345411044"
@anil1994 can you share more details about the deployment and environment? Like chart version, k8s version, cpu/memory in the cluster, vendor or tool used to create K8s cluster.
Also, share logs of the affected containers if any.
9m9s Normal SuccessfullyReconciled ingress/signoz-frontend Successfully reconciled 9m51s Normal Created pod/signoz-otel-collector-7f9c7d6b9-qkglp Created container signoz-otel-collector-init 9m51s Normal SuccessfulCreate statefulset/signoz-query-service create Pod signoz-query-service-0 in StatefulSet signoz-query-service successful 9m50s Normal Started pod/signoz-alertmanager-0 Started container signoz-alertmanager-init 9m49s Normal Created pod/signoz-query-service-0 Created container signoz-query-service-init 9m49s Normal Started pod/signoz-query-service-0 Started container signoz-query-service-init 9m49s Normal Pulled pod/signoz-query-service-0 Container image "docker.io/busybox:1.35" already present on machine 9m42s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete host cluster/0-0 - started 9m42s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete CHI started 9m42s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete cluster platform/cluster - started 9m42s Info DeleteCompleted clickhouseinstallation/signoz-clickhouse Delete host cluster/0-0 - completed StatefulSet not found - already deleted? err: statefulsets.apps "chi-signoz-clickhouse-cluster-0-0" not found 9m42s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete shard platform/0 - started 9m42s Info DeleteCompleted clickhouseinstallation/signoz-clickhouse Delete shard platform/0 - completed 9m41s Info DeleteCompleted clickhouseinstallation/signoz-clickhouse Delete cluster platform/cluster - completed 9m40s Info DeleteCompleted clickhouseinstallation/signoz-clickhouse Delete CHI completed 9m19s Normal Killing pod/signoz-alertmanager-0 Stopping container signoz-alertmanager-init 9m19s Normal Killing pod/signoz-query-service-0 Stopping container signoz-query-service-init 9m16s Normal Killing pod/signoz-zookeeper-0 Stopping container zookeeper 9m16s Normal Killing pod/signoz-frontend-5c645d6545-tzmbq Stopping container signoz-frontend-init 9m14s Normal Killing pod/signoz-clickhouse-operator-666885dc95-bxqxm Stopping container signoz-clickhouse-metrics-exporter 9m14s Normal Killing pod/signoz-clickhouse-operator-666885dc95-bxqxm Stopping container signoz-clickhouse-operator 9m16s Warning BackendNotFound targetgroupbinding/k8s-platform-signozfr-9dfc9e9803 backend not found: Service "signoz-frontend" not found 9m16s Normal Killing pod/signoz-otel-collector-7f9c7d6b9-qkglp Stopping container signoz-otel-collector-init 9m16s Normal Killing pod/signoz-otel-collector-metrics-694fd48977-mvfbv Stopping container signoz-otel-collector-metrics-init 8m44s Normal Killing pod/signoz-frontend-5c645d6545-tzmbq Stopping container signoz-frontend-init
@prashant-shahi , Vendor: AWS environment: EKS k8s: 1.24 version.
I saw events, 9m42s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete host cluster/0-0 - started
Is there ability clickhouseinstallation/signoz-clickhouse delete cluster in this chart? Most probably clickhouse operator did this.
kubectl get clickhouseinstallation -n platform -o wide
> NAME VERSION CLUSTERS SHARDS HOSTS TASKID STATUS UPDATED ADDED DELETED DELETE ENDPOINT AGE
> signoz-clickhouse 0.19.1 1 1 1 25c4733d-4128-4b8e-b057-ad6403ff9421 Completed 1 signoz-clickhouse.platform.svc.cluster.local 3m34s
This deleted somehow
issue not reproducible in EKS cluster with the identical override-values.yaml
.
As discussed over the call, the issue is likely with your Kubernetes cluster itself. It is recommended to install SigNoz in separate K8s cluster.
Do let me know how it goes with it.
Bug description
I do not understand, all k8s object deleted after a while, after I run commond helm upgrade --namespace platform signoz signoz/signoz -f override-values.yaml
What is the reason of that?
79s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete cluster platform/cluster - started
Expected behavior
I have just deployed SigNoz cluster:
Additional context
override-values.yaml
global: storageClass: gp2 cloud: aws
clickhouse: enable: true installCustomStorageClass: true k8s-infra:
-- Whether to enable K8s infra monitoring
enabled: false