elastic / fleet-server

The Fleet server allows managing a fleet of Elastic Agents.
Other
81 stars 80 forks source link

(FAILED) status code: 0, fleet-server returned an error: , message: An invalid response was received from the upstream server #3074

Open abdul90082 opened 10 months ago

abdul90082 commented 10 months ago

We are encountering errors in our current deployment involving Fleet Server and Fleet Agent components. The specific errors we are facing are as follows:

Fleet Server Error: Error Message: "Non-zero metrics in the last 30s"

Fleet Agent Error: Error Message: "Cannot check in with fleet-server, retrying"

elastic-agent status ┌─ fleet │ └─ status:(FAILED) status code: 0,fleet-server returned an error:message:An invalid response was received fromtheupstreamserver └─ elastic-agent └─ status: (HEALTHY) Running

fleet server status ┌─ fleet │ └─ status: (HEALTHY) Connected └─ elastic-agent ├─ status: (HEALTHY) Running

Getting these error in elastic_agent in kibana [elastic_agent][error] Cannot checkin in with fleet-server, retrying

Environment:

Fleet Server is deployed within our “infrastructure” cluster. This cluster includes Elasticsearch and Kibana components, which are functioning correctly.

Fleet Agent is deployed in one of our Kubernetes “playground” clusters. The purpose of this agent is to collect Kubernetes logs and other observability-related data.

In Kibana the agent is unhealthy/offline (status is flapping from healthy to offline and sometimes back) while the fleet is healthy and online all the time. Interestingly enough, even though the Fleet Agents are periodically marked as offline, when we have a look at the agent metrics, these seem to be still collecting.

Additional Information: We need assistance in identifying and resolving these errors to ensure the proper functioning of our deployment. Any guidance or support in addressing these issues would be greatly appreciated. Thank you for your assistance.

  access_api_key: Y1pHOHgtdw==
  agent:
    id: ac177c50-da37-490b-9ed8-a755be756174
  enabled: true
  host: localhost:5601
  hosts:
  - https://fleet-server.xyz.com:443
  protocol: http
  ssl:
    renegotiation: never
    verification_mode: full
  timeout: 10m0s
michel-laterman commented 10 months ago

The non-zero metrics message is not an error. There should be additional information with the "cannot check in" message (indicating a timeout or some other issue).

What version are you running, and can you provide diagnostics bundles from the agent running fleet-server as well as the other agent?

@abdul90082