anil1994 commented 1 year ago

Bug description

I do not understand, all k8s object deleted after a while, after I run commond helm upgrade --namespace platform signoz signoz/signoz -f override-values.yaml

What is the reason of that?

79s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete cluster platform/cluster - started

Expected behavior

I have just deployed SigNoz cluster:

frontend version: '0.18.1'
query-service version: '0.18.1'
alertmanager version: '0.23.0-0.2'
otel-collector version: '0.66.7'
otel-collector-metrics version: '0.66.7'

Additional context

override-values.yaml

global: storageClass: gp2 cloud: aws

clickhouse: enable: true installCustomStorageClass: true k8s-infra:

-- Whether to enable K8s infra monitoring

enabled: false

welcome[bot] commented 1 year ago

Thanks for opening this issue. A team member should give feedback soon. In the meantime, feel free to check out the contributing guidelines.

prashant-shahi commented 1 year ago

all k8s object deleted after a while, after I run helm upgrade

@anil1994 I am not sure if would just delete all objects after the upgrade command.

It is possible that you ran helm repo update and now it installs newer version of chart. However, it should re-create all the resource properly. Did it not?

Also, could you please share the override-values.yaml with proper indentation?

anil1994 commented 1 year ago

override-values.yaml

global:
  storageClass: gp2
  cloud: aws

clickhouse:
  enable: true
  installCustomStorageClass: true

k8s-infra:
  enabled: false

anil1994 commented 1 year ago

now, I tried helm repo update, and then, run helm upgrade command, I waiting now for looking whether helm objects is deleted or not

anil1994 commented 1 year ago

again I encountered same issue, I looked event by running kubectl get events command. Any idea? @prashant-shahi

81s Warning Unhealthy pod/chi-signoz-clickhouse-cluster-0-0-0 Readiness probe failed: Get "http://10.1.27.206:8123/ping": dial tcp 10.1.27.206:8123: connect: connection refused

78s Warning FailedKillPod pod/chi-signoz-clickhouse-cluster-0-0-0 error killing pod: failed to "KillContainer" for "clickhouse" with KillContainerError: "rpc error: code = Unknown desc = Error response from daemon: No such container: 2ee872bb111ffb478c7de74cd5a98727a499e8445f266decfcb19013d2fd72a6"

85s Normal SuccessfulDelete statefulset/chi-signoz-clickhouse-cluster-0-0 delete Pod chi-signoz-clickhouse-cluster-0-0-0 in StatefulSet chi-signoz-clickhouse-cluster-0-0 successful

86s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete CHI started

86s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete shard platform/0 - started

85s Info DeleteCompleted clickhouseinstallation/signoz-clickhouse Deleted tables on host 0-0 replica 0 to shard 0 in cluster cluster

84s Normal Killing pod/signoz-alertmanager-0 Stopping container signoz-alertmanager

84s Warning FailedKillPod pod/signoz-alertmanager-0 error killing pod: failed to "KillContainer" for "signoz-alertmanager" with KillContainerError: "rpc error: code = Unknown desc = Error response from daemon: No such container: e17eea8d1992521e7591085a00581e40a837b3101326f183902192a345411044"

prashant-shahi commented 1 year ago

@anil1994 can you share more details about the deployment and environment? Like chart version, k8s version, cpu/memory in the cluster, vendor or tool used to create K8s cluster.

Also, share logs of the affected containers if any.

anil1994 commented 1 year ago

9m9s Normal SuccessfullyReconciled ingress/signoz-frontend Successfully reconciled 9m51s Normal Created pod/signoz-otel-collector-7f9c7d6b9-qkglp Created container signoz-otel-collector-init 9m51s Normal SuccessfulCreate statefulset/signoz-query-service create Pod signoz-query-service-0 in StatefulSet signoz-query-service successful 9m50s Normal Started pod/signoz-alertmanager-0 Started container signoz-alertmanager-init 9m49s Normal Created pod/signoz-query-service-0 Created container signoz-query-service-init 9m49s Normal Started pod/signoz-query-service-0 Started container signoz-query-service-init 9m49s Normal Pulled pod/signoz-query-service-0 Container image "docker.io/busybox:1.35" already present on machine 9m42s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete host cluster/0-0 - started 9m42s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete CHI started 9m42s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete cluster platform/cluster - started 9m42s Info DeleteCompleted clickhouseinstallation/signoz-clickhouse Delete host cluster/0-0 - completed StatefulSet not found - already deleted? err: statefulsets.apps "chi-signoz-clickhouse-cluster-0-0" not found 9m42s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete shard platform/0 - started 9m42s Info DeleteCompleted clickhouseinstallation/signoz-clickhouse Delete shard platform/0 - completed 9m41s Info DeleteCompleted clickhouseinstallation/signoz-clickhouse Delete cluster platform/cluster - completed 9m40s Info DeleteCompleted clickhouseinstallation/signoz-clickhouse Delete CHI completed 9m19s Normal Killing pod/signoz-alertmanager-0 Stopping container signoz-alertmanager-init 9m19s Normal Killing pod/signoz-query-service-0 Stopping container signoz-query-service-init 9m16s Normal Killing pod/signoz-zookeeper-0 Stopping container zookeeper 9m16s Normal Killing pod/signoz-frontend-5c645d6545-tzmbq Stopping container signoz-frontend-init 9m14s Normal Killing pod/signoz-clickhouse-operator-666885dc95-bxqxm Stopping container signoz-clickhouse-metrics-exporter 9m14s Normal Killing pod/signoz-clickhouse-operator-666885dc95-bxqxm Stopping container signoz-clickhouse-operator 9m16s Warning BackendNotFound targetgroupbinding/k8s-platform-signozfr-9dfc9e9803 backend not found: Service "signoz-frontend" not found 9m16s Normal Killing pod/signoz-otel-collector-7f9c7d6b9-qkglp Stopping container signoz-otel-collector-init 9m16s Normal Killing pod/signoz-otel-collector-metrics-694fd48977-mvfbv Stopping container signoz-otel-collector-metrics-init 8m44s Normal Killing pod/signoz-frontend-5c645d6545-tzmbq Stopping container signoz-frontend-init

anil1994 commented 1 year ago

@prashant-shahi , Vendor: AWS environment: EKS k8s: 1.24 version.
I saw events, 9m42s Info DeleteStarted clickhouseinstallation/signoz-clickhouse Delete host cluster/0-0 - started

Is there ability clickhouseinstallation/signoz-clickhouse delete cluster in this chart? Most probably clickhouse operator did this.

anil1994 commented 1 year ago

                               kubectl get clickhouseinstallation -n platform -o wide

> NAME                VERSION   CLUSTERS   SHARDS   HOSTS   TASKID                                 STATUS      UPDATED   ADDED   DELETED   DELETE   ENDPOINT                                       AGE
> signoz-clickhouse   0.19.1    1          1        1       25c4733d-4128-4b8e-b057-ad6403ff9421   Completed             1                          signoz-clickhouse.platform.svc.cluster.local   3m34s

anil1994 commented 1 year ago

This deleted somehow

prashant-shahi commented 1 year ago

issue not reproducible in EKS cluster with the identical override-values.yaml.

As discussed over the call, the issue is likely with your Kubernetes cluster itself. It is recommended to install SigNoz in separate K8s cluster.

Do let me know how it goes with it.

SigNoz / signoz

Helm all objects deleted after a while #2621

Bug description

Expected behavior

Additional context

-- Whether to enable K8s infra monitoring