elastic / cloud-on-k8s

Elastic Cloud on Kubernetes
Other
59 stars 708 forks source link

Fleet server pod not creating: ECK Operator #6979

Open apgapg opened 1 year ago

apgapg commented 1 year ago

Bug Report

I tried deploying ECK stack by using following steps:

  1. helm repo add elastic https://helm.elastic.co && helm repo update
  2. helm upgrade --install elastic-operator elastic/eck-operator -n elastic-system --create-namespace. this deployed eck operator successfully
  3. helm upgrade --install eck-stack elastic/eck-stack -n elastic-stack --create-namespace --values ./values.yml

Here's the values.yml

eck-elasticsearch:
  enabled: true
  annotations:
    eck.k8s.elastic.co/license: basic

eck-kibana:
  enabled: true
  annotations:
    eck.k8s.elastic.co/license: basic

eck-fleet-server:
  enabled: true
  annotations:
    eck.k8s.elastic.co/license: basic
  spec:
    elasticsearchRefs:
      - name: elasticsearch
    kibanaRef:
      name: eck-stack-eck-kibana

I needed to change elasticsearchRefs and kibana ref as default were not establishing.

After above, elastic search and kibana pods are successfullt deployed, but i cant see the Fleet server pod.

image

I can see the fleet agent but no pod.

Here's the agent info

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  annotations:
    association.k8s.elastic.co/es-conf-3780043617: '{"authSecretName":"eck-stack-eck-fleet-server-elastic-stack-elasticsearch-agent-user","authSecretKey":"token","isServiceAccount":true,"caCertProvided":true,"caSecretName":"eck-stack-eck-fleet-server-agent-es-elastic-stack-elasticsearch-ca","url":"https://elasticsearch-es-http.elastic-stack.svc:9200","version":"8.8.0"}'
    association.k8s.elastic.co/kb-conf: '{"authSecretName":"eck-stack-eck-fleet-server-agent-kb-user","authSecretKey":"elastic-stack-eck-stack-eck-fleet-server-agent-kb-user","isServiceAccount":false,"caCertProvided":true,"caSecretName":"eck-stack-eck-fleet-server-agent-kibana-ca","url":"https://eck-stack-eck-kibana-kb-http.elastic-stack.svc:5601","version":"8.8.0"}'
    eck.k8s.elastic.co/license: basic
    meta.helm.sh/release-name: eck-stack
    meta.helm.sh/release-namespace: elastic-stack
  creationTimestamp: "2023-07-05T03:26:52Z"
  generation: 3
  labels:
    app.kubernetes.io/instance: eck-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: eck-fleet-server
    helm.sh/chart: eck-fleet-server-0.6.0
  name: eck-stack-eck-fleet-server
  namespace: elastic-stack
  resourceVersion: "197494623"
  uid: 4e75af00-e8a8-4cdc-a0be-048f7ac913c4
spec:
  deployment:
    podTemplate:
      metadata:
        creationTimestamp: null
      spec:
        automountServiceAccountToken: true
        containers: null
        securityContext:
          runAsUser: 0
        serviceAccountName: fleet-server
    replicas: 1
    strategy: {}
  elasticsearchRefs:
  - name: elasticsearch
  fleetServerEnabled: true
  fleetServerRef: {}
  http:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate: {}
  kibanaRef:
    name: eck-stack-eck-kibana
  mode: fleet
  policyID: eck-fleet-server
  version: 8.8.0
status:
  elasticsearchAssociationsStatus:
    elastic-stack/elasticsearch: Established
  kibanaAssociationStatus: Established
  observedGeneration: 3

When i look back into ECK operator POD logs

{"log.level":"error","@timestamp":"2023-07-05T03:31:20.441Z","log.logger":"manager.eck-operator","message":"Reconciler error","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","controller":"agent-controller","object":{"name":"eck-stack-eck-fleet-server","namespace":"elastic-stack"},"namespace":"elastic-stack","name":"eck-stack-eck-fleet-server","reconcileID":"2e7f7944-e56d-498c-9ab9-0eef0a410951","error":"failed to request https://eck-stack-eck-kibana-kb-http.elastic-stack.svc:5601/api/fleet/setup, status is 401)","errorCauses":[{"error":"failed to request https://eck-stack-eck-kibana-kb-http.elastic-stack.svc:5601/api/fleet/setup, status is 401)"}],"error.stack_trace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235"}
{"log.level":"info","@timestamp":"2023-07-05T03:31:40.921Z","log.logger":"agent-controller","message":"Starting reconciliation run","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","iteration":"26","namespace":"elastic-stack","agent_name":"eck-stack-eck-fleet-server"}
{"log.level":"info","@timestamp":"2023-07-05T03:31:40.969Z","log.logger":"agent-controller","message":"Ending reconciliation run","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","iteration":"26","namespace":"elastic-stack","agent_name":"eck-stack-eck-fleet-server","took":0.047632012}
{"log.level":"error","@timestamp":"2023-07-05T03:31:40.969Z","log.logger":"manager.eck-operator","message":"Reconciler error","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","controller":"agent-controller","object":{"name":"eck-stack-eck-fleet-server","namespace":"elastic-stack"},"namespace":"elastic-stack","name":"eck-stack-eck-fleet-server","reconcileID":"ba47d6f7-d2b8-406c-bef6-bb04710282d1","error":"failed to request https://eck-stack-eck-kibana-kb-http.elastic-stack.svc:5601/api/fleet/setup, status is 401)","errorCauses":[{"error":"failed to request https://eck-stack-eck-kibana-kb-http.elastic-stack.svc:5601/api/fleet/setup, status is 401)"}],"error.stack_trace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235"}
{"log.level":"info","@timestamp":"2023-07-05T03:32:21.930Z","log.logger":"agent-controller","message":"Starting reconciliation run","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","iteration":"27","namespace":"elastic-stack","agent_name":"eck-stack-eck-fleet-server"}
{"log.level":"info","@timestamp":"2023-07-05T03:32:22.481Z","log.logger":"agent-controller","message":"Could not find existing Fleet enrollment API keys, creating new one","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","iteration":"27","namespace":"elastic-stack","agent_name":"eck-stack-eck-fleet-server","error":"no matching active enrollment token found"}
{"log.level":"info","@timestamp":"2023-07-05T03:32:22.500Z","log.logger":"agent-controller","message":"Ending reconciliation run","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","iteration":"27","namespace":"elastic-stack","agent_name":"eck-stack-eck-fleet-server","took":0.570665192}
{"log.level":"error","@timestamp":"2023-07-05T03:32:22.500Z","log.logger":"manager.eck-operator","message":"Reconciler error","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","controller":"agent-controller","object":{"name":"eck-stack-eck-fleet-server","namespace":"elastic-stack"},"namespace":"elastic-stack","name":"eck-stack-eck-fleet-server","reconcileID":"a0e9d3af-7b6d-4b1f-974a-b84288784bf6","error":"failed to request https://eck-stack-eck-kibana-kb-http.elastic-stack.svc:5601/api/fleet/enrollment_api_keys, status is 400)","errorCauses":[{"error":"failed to request https://eck-stack-eck-kibana-kb-http.elastic-stack.svc:5601/api/fleet/enrollment_api_keys, status is 400)"}],"error.stack_trace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235"}
{"log.level":"info","@timestamp":"2023-07-05T03:33:44.421Z","log.logger":"agent-controller","message":"Starting reconciliation run","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","iteration":"28","namespace":"elastic-stack","agent_name":"eck-stack-eck-fleet-server"}
{"log.level":"info","@timestamp":"2023-07-05T03:33:44.693Z","log.logger":"agent-controller","message":"Could not find existing Fleet enrollment API keys, creating new one","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","iteration":"28","namespace":"elastic-stack","agent_name":"eck-stack-eck-fleet-server","error":"no matching active enrollment token found"}
{"log.level":"info","@timestamp":"2023-07-05T03:33:44.711Z","log.logger":"agent-controller","message":"Ending reconciliation run","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","iteration":"28","namespace":"elastic-stack","agent_name":"eck-stack-eck-fleet-server","took":0.289479861}
{"log.level":"error","@timestamp":"2023-07-05T03:33:44.711Z","log.logger":"manager.eck-operator","message":"Reconciler error","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","controller":"agent-controller","object":{"name":"eck-stack-eck-fleet-server","namespace":"elastic-stack"},"namespace":"elastic-stack","name":"eck-stack-eck-fleet-server","reconcileID":"2b92d899-98b1-4857-b594-1a9f17bb7b3c","error":"failed to request https://eck-stack-eck-kibana-kb-http.elastic-stack.svc:5601/api/fleet/enrollment_api_keys, status is 400)","errorCauses":[{"error":"failed to request https://eck-stack-eck-kibana-kb-http.elastic-stack.svc:5601/api/fleet/enrollment_api_keys, status is 400)"}],"error.stack_trace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235"}
{"log.level":"info","@timestamp":"2023-07-05T03:36:28.552Z","log.logger":"agent-controller","message":"Starting reconciliation run","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","iteration":"29","namespace":"elastic-stack","agent_name":"eck-stack-eck-fleet-server"}
{"log.level":"info","@timestamp":"2023-07-05T03:36:28.929Z","log.logger":"agent-controller","message":"Could not find existing Fleet enrollment API keys, creating new one","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","iteration":"29","namespace":"elastic-stack","agent_name":"eck-stack-eck-fleet-server","error":"no matching active enrollment token found"}
{"log.level":"info","@timestamp":"2023-07-05T03:36:28.941Z","log.logger":"agent-controller","message":"Ending reconciliation run","service.version":"2.8.0+3940cf4d","service.type":"eck","ecs.version":"1.4.0","iteration":"29","namespace":"elastic-stack","agent_name":"eck-stack-eck-fleet-server","took":0.389191243}

I can see the following error: failed to request https://eck-stack-eck-kibana-kb-http.elastic-stack.svc:5601/api/fleet/enrollment_api_keys, status is 400)

Not able to find what is the actual cause.

What did you expect to see?

A fleet server POD running.

What did you see instead? Under which circumstances?

No fleet server pod

Environment

azhurbilo commented 1 year ago

I had similar issue /api/fleet/enrollment_api_keys, status is 400

than found in kibana logs that xpack.fleet.agentPolicies were not created

example in kibana (as you use elastic/eck-stack you can override whole spec for Kibana resource https://github.com/elastic/cloud-on-k8s/blob/main/deploy/eck-stack/charts/eck-kibana/templates/kibana.yaml#L15 )

xpack.fleet.agentPolicies:
      - name: Fleet Server on ECK policy
        id: eck-fleet-server
        ...
      - name: Elastic Agent on ECK policy
        id: eck-agent
        ...

in agent fleet server

policyID: eck-agent

resolved an issue and generate token for fleet server

azhurbilo commented 1 year ago

@pebrc maybe yoy can help here?

as after elasticsearch version upgrade Fleet server stop working

I try delete agents/integrations/policies/tokens https://url-kibana.com/app/fleet/policies https://url-kibana.com/app/fleet/enrollment-tokens

but now when I up fleet server it returns

Reconciliation error: failed to request https://elastic-search-kb-http.elasticsearch.svc:5601/api/fleet/enrollment_api_keys, status is 400)

if check https://url-kibana.com/app/fleet/enrollment-tokens there is no any tokens

in kibana logs

[2023-10-26T11:40:10.674+00:00][INFO ][plugins.fleet] Beginning fleet setup
[2023-10-26T11:40:11.065+00:00][INFO ][plugins.fleet] Fleet setup completed
[2023-10-26T11:40:12.101+00:00][INFO ][plugins.fleet] Fleet Usage: {"agents_enabled":true,"agents":{"total_enrolled":0,"healthy":0,"unhealthy":0,"offline":0,"inactive":0,"unenrolled":390,"total_all_statuses":390,"updating":0},"fleet_server":{"total_all_statuses":0,"total_enrolled":0,"healthy":0,"unhealthy":0,"offline":0,"updating":0,"num_host_urls":0}}
[2023-10-26T11:42:59.324+00:00][ERROR][plugins.fleet] Agent policy "fleet-server-policy" not found

but CRD resources created

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
...
  mode: fleet
  fleetServerEnabled: true
  policyID: fleet-server-policy
---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
...
  mode: fleet
  policyID: agent-policy
naemono commented 6 months ago

@apgapg As noted, agentPolicies needs to be configured in Kibana for this to function properly

Check our default fleet example, and our eck-stack example.

@azhurbilo can we get a full set of manifests such that we can easily replicate your setup (pre-upgrade, and post-upgrade preferably) and try and understand the issue? Thanks.