elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.63k stars 8.22k forks source link

[Fleet] Improve agent observability #78188

Open mostlyjason opened 4 years ago

mostlyjason commented 4 years ago

Summary of the problem We'd like to improve the observability for agents so that operators have better insights into problems and have enough information to troubleshoot and fix them in a timely manner. Additionally, the most insight we can share with users to fix issues on their own, the less often they will get stuck and need to file a support issue.

Potential scope, PM will need to better define it:

**User stories***

List known (technical) restrictions and requirements

Other PM Lead @mukeshelastic Design lead @hbharding Collaborators @mostlyjason

mostlyjason commented 4 years ago

@mukeshelastic I filed this design issue for planning purposes. Please review and update as desired.

katrin-freihofner commented 4 years ago

@mostlyjason it says here "...Potential scope, PM will need to better define it..." when do you think this issue will be ready to be picked up?

mostlyjason commented 4 years ago

@mukeshelastic is the PM lead for this issue so I'll defer to him.

I believe some parts are ready such as including the logstream component on the agent details page https://github.com/elastic/kibana/issues/77189

mukeshelastic commented 4 years ago

@hbharding and I discussed the two buckets in which we will need design support:

  1. Researching and validating problems in agent observability with few user interviews.
  2. Exploring and designing experiences we want to build for the MVP prioritized problems.
ravikesarwani commented 4 years ago

https://github.com/elastic/kibana/issues/81872

hbharding commented 4 years ago

Small update: per @mukeshelastic + @ravikesarwani, we want to scope the initial work for this ticket in https://github.com/elastic/kibana/issues/81872 and treat this issue more as an ongoing epic that will extend beyond 7.11.

cc @mostlyjason @ph @katrin-freihofner

elasticmachine commented 3 years ago

Pinging @elastic/fleet (Team:Fleet)

mtojek commented 2 years ago

We had an offline conversation with @joshdover around improvements.

There is a noticeable amount of SDH issues coming, which end up with a root cause, or one of the possible causes, like proxy connectivity issues. The customer has to dive into logs to figure out if the used proxy operates properly (whether connections are established, no 503s, etc.).

I believe we could more proactive and verify the connectivity between Agent and Elasticsearch, Agent and Fleet Server. I was thinking about a special technical policy first to verify all connections and settings, but maybe we can start with picking up the elastic-agent install feedback.

It would definitely help with researching customer problems ("Has your proxy ever worked?" vs "Is there an proxy outage now?").