Open sebgl opened 4 years ago
What ready
should indicate though? If Beat can start getting logs/metrics in, I'd consider it ready even if the output is not ready itself. I'd think that's what outputs (ES for instance) ready
is for.
For filebeat, filebeat test output
is at least what the helm chart uses:
https://github.com/elastic/helm-charts/blob/master/filebeat/values.yaml#L72
If you'd like to start sending traffic to a Pod only when a probe succeeds, specify a readiness probe. In this case, the readiness probe might be the same as the liveness probe, but the existence of the readiness probe in the spec means that the Pod will start without receiving any traffic and only start receiving traffic after the probe starts succeeding. If your Container needs to work on loading large data, configuration files, or migrations during startup, specify a readiness probe.
If you want your Container to be able to take itself down for maintenance, you can specify a readiness probe that checks an endpoint specific to readiness that is different from the liveness probe.
The main reason I can think we would want to define a readiness probe is if you were using beats to monitor your other beats. In that case I think you would want to know if the beat was up but the output was down (and so it should be ready even if the output is down).
"Is the output responding" seems more of a question of health in the beats status. I'm not sure there's a good way for ECK to retrieve that though. We currently define beat health as:
const (
// BeatRedHealth means that the health is neither yellow nor green.
BeatRedHealth BeatHealth = "red"
// BeatYellowHealth means that:
// 1) at least one Pod is Ready, and
// 2) association is not configured, or configured and established
BeatYellowHealth BeatHealth = "yellow"
// BeatGreenHealth means that:
// 1) all Pods are Ready, and
// 2) association is not configured, or configured and established
BeatGreenHealth BeatHealth = "green"
)
In that case I think you would want to know if the beat was up but the output was down (and so it should be ready even if the output is down).
I'm not sure I'm getting what do you mean here. If we have:
ES <---- Metricbeat --(monitoring)--> Filebeat --(shipping logs for)--> Pod
Then we can have the following (main) failure cases:
For "Is the output responding" I agree it's difficult, I think we would only know from logs that there is an issue.
I'm not sure I'm getting what do you mean here.
Because I did a poor job of explaining it :D What I meant was that I think we want to leave it as is for the reasons you described in your comment. If we want to do anything it would be exposing the output status in the Beats CR, but I'm not sure we can simply (maybe the beats state/status endpoint exposes the info?).
We should probably close this in favour of another issue that will update the status of the Beats resource with some information about the output status.
Just as an aside because filebeat test output
was mentioned, it returns an error despite a working configuration due to a DNS check it does:
[root@gke-pebrc-dev-cluster-default-pool-0ce0f2c1-nl52 filebeat]# filebeat test output
elasticsearch: http://elasticsearch:9200...
parse url... OK
connection...
parse host... OK
dns lookup... ERROR lookup elasticsearch on 10.73.16.10:53: no such host
We probably want to introduce a readiness probe for Beats. It's a bit surprising right now to see filebeat "ready" while Elasticsearch is unavailable.
It looks like we could execute a
filebeat test output
command. To investigate.