elastic / eck-diagnostics

Diagnostic tooling for ECK installations
Other
20 stars 15 forks source link

ECK diagnostic tool errors on ECK operator created using Helm Chart #161

Open AkashPrakashShinde opened 1 year ago

AkashPrakashShinde commented 1 year ago

Customer seeing issues while running ECK diagnostics tool [on openshift]. ECK version is 2.6 as per case description Problem is as below.

We are using the latest version of 1.4.0
Version eck diagnostic 1.4.0 (latests)

We are running the name space
oc project -q
ocp-install-backupz

Note that the stateful set exists and the pod for the operator is: elastic-operator-operator-0.
This can be seen here

oc get po | grep "elastic-operator"
elastic-operator-operator-0 1/1 Running 0 25h
elastic-operator-user-es-data-nodes-0 1/1 Running 0 25h
elastic-operator-user-es-master-nodes-0 1/1 Running 0 25h
elastic-operator-user-exporter-8448ccbbdd-mhqc9 1/1 Running 0 25h

Full verbose messages can be seen here.
took 39ms k8s@fotismi01 ~ > ./eck-diagnostics -o `oc project -q` -r `oc project -q` --verbose --diagnostic-image docker-registry-proxy.corp.amdocs.com/eck-dev/support-diagnostics:8.4.0
2023/03/28 12:51:35 ECK diagnostics with parameters: {DiagnosticImage:docker-registry-proxy.corp.amdocs.com/eck-dev/support-diagnostics:8.4.0 ECKVersion: Kubeconfig: OperatorNamespaces:[ocp-install-backupz] ResourcesNamespaces:[ocp-install-backupz] OutputDir: RunStackDiagnostics:true RunAgentDiagnostics:false Verbose:true StackDiagnosticsTimeout:5m0s}
I0328 12:51:42.559183 1037900 request.go:601] Waited for 1.060700737s due to client-side throttling, not priority and fairness, request: GET:https://api.ilocpplt406.ocpd.corp.amdocs.com:6443/apis/enterprisesearch.k8s.elastic.co/v1?timeout=32s
2023/03/28 12:51:48 Extracting Kubernetes diagnostics from ocp-install-backupz
2023/03/28 12:51:49 operator statefulset not found, checking for OLM deployment but failed: deployments.apps "elastic-operator" not found
2023/03/28 12:53:48 ECK version is unknown
2023/03/28 12:53:48 Extracting Kubernetes diagnostics from ocp-install-backupz
2023/03/28 12:54:04 Diagnostic pod ocp-install-backupz/elastic-operator-user-elasticsearch-diag added
2023/03/28 12:54:04 Diagnostic pod ocp-install-backupz/elastic-enterprise-user-elasticsearch-diag added
2023/03/28 12:59:03 Diagnostic job for elasticsearch ocp-install-backupz/elastic-enterprise-user timed out, terminating
2023/03/28 12:59:03 ocp-install-backupz/elastic-enterprise-user-elasticsearch-diag deleted
2023/03/28 12:59:03 Diagnostic job for elasticsearch ocp-install-backupz/elastic-operator-user timed out, terminating
2023/03/28 12:59:04 ocp-install-backupz/elastic-operator-user-elasticsearch-diag deleted
2023/03/28 12:59:04 ECK diagnostics written to eck-diagnostic-2023-03-28T12-51-40.zip

They are using the official chart: https://github.com/elastic/cloud-on-k8s/tree/2.6.1/deploy/eck-operator

These 2 custom values were updated on env: https://github.com/elastic/cloud-on-k8s/blob/2.6.1/deploy/eck-operator/values.yaml#L2-L5

They don’t have any control-plane=elastic-operator-operator nor control-plane=elastic-operator label at the statefulset level.

Failed bundle is uploaded at https://upload.elastic.co/d/86573dc0a42d5258e5cd569b4fcfdc1a70b363a4df53b9a7e1efadd705ddae9f Authorization Token: 7b1e09891adc5717

How the helm.sh/chart label is supposed to be set? I assume Helm is using the chart name along with chart version right? Is Helm (not the charts developer) responsible for computing/adding the helm.sh/chart label?

Looking at the customer statefulset:

Labels:             
                 app.kubernetes.io/instance=elastic-operator-operator

                    app.kubernetes.io/managed-by=Helm

                    app.kubernetes.io/name=elastic-operator-operator

                    app.kubernetes.io/version=2.2.0

                    helm.sh/chart=elastic-operator-operator-1.12.0

How can we fix it?

pebrc commented 1 year ago

I have my doubts that they are using the official Helm chart given that that chart name is not the official one:

helm.sh/chart=elastic-operator-operator-1.12.0

when it should be

helm.sh/chart: eck-operator-2.6.1

I am not sure if we can support user provided Helm charts with this tool. What we can do is add a section to the README explaining the expected labelling if you are rolling your own Helm chart.

AkashPrakashShinde commented 1 year ago

Hello @pebrc can you please share the README section where labelling explanation added?

pebrc commented 1 year ago

Sorry that fell between the cracks. It is coming now in https://github.com/elastic/eck-diagnostics/pull/184/files