Closed yixiangop closed 4 years ago
Hard to say what is happening here without more complete logs and stats. Please provide them.
There are many other services with similar configurations.So I only pasted two of the route configurations.
Config:
static_resources:
listeners:
- address:
socket_address: {address: 0.0.0.0, port_value: 9191}
name: rest_listener
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
http_filters:
- {name: envoy.router}
stat_prefix: ingress_http
route_config:
virtual_hosts:
- routes:
- route:
cluster: configserver_rest
prefix_rewrite: /
retry_policy: {retry_on: 5xx, num_retries: 3}
match: {prefix: /configserver/}
- route:
cluster: uidgenerator_rest
prefix_rewrite: /
retry_policy: {retry_on: 5xx, num_retries: 3}
match: {prefix: /uidgenerator/}
name: rest_host
domains: ['*']
name: rest_route
codec_type: AUTO
- address:
socket_address: {address: 0.0.0.0, port_value: 7676}
name: grpc_listener
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
http_filters:
- {name: envoy.router}
stat_prefix: ingress_http
route_config:
virtual_hosts:
- routes:
- route:
cluster: configserver_grpc
prefix_rewrite: /
retry_policy: {retry_on: 5xx, num_retries: 3}
match:
prefix: /configserver/
grpc: {}
- route:
cluster: uidgenerator_grpc
prefix_rewrite: /
retry_policy: {retry_on: 5xx, num_retries: 3}
match:
prefix: /uidgenerator/
grpc: {}
name: grpc_host
domains: ['*']
name: grpc_route
codec_type: AUTO
clusters:
- connect_timeout: 1s
lb_policy: LEAST_REQUEST
hosts:
socket_address: {address: configserver, port_value: 9191}
health_checks:
#- unhealthy_interval: 180s
- healthy_threshold: 3
unhealthy_threshold: 3
#interval_jitter: 1s
interval: 60s
http_health_check: {path: "/health"}
timeout: 1s
reuse_connection: true
event_log_path: /work/healthcheck.log
always_log_health_check_failures: true
name: configserver_rest
type: STRICT_DNS
- http2_protocol_options: {}
connect_timeout: 1s
lb_policy: LEAST_REQUEST
hosts:
socket_address: {address: configserver, port_value: 7676}
name: configserver_grpc
type: STRICT_DNS
- connect_timeout: 1s
lb_policy: LEAST_REQUEST
hosts:
socket_address: {address: uidgenerator, port_value: 9191}
health_checks:
#- unhealthy_interval: 180s
- healthy_threshold: 3
unhealthy_threshold: 3
#interval_jitter: 1s
interval: 60s
http_health_check: {path: "/health"}
timeout: 1s
reuse_connection: true
event_log_path: /work/healthcheck.log
always_log_health_check_failures: true
name: uidgenerator_rest
type: STRICT_DNS
- http2_protocol_options: {}
connect_timeout: 1s
lb_policy: LEAST_REQUEST
hosts:
socket_address: {address: uidgenerator, port_value: 7676}
name: uidgenerator_grpc
type: STRICT_DNS
admin:
address:
socket_address: {address: 0.0.0.0, port_value: 9002}
access_log_path: /dev/null
Healthcheck error logs are basically the same Envoy healthcheck error logs:
{"health_checker_type":"HTTP","host":{"socket_address":{"protocol":"TCP","address":"10.0.18.11","resolver_name":"","ipv4_compat":false,"port_value":9191}},"cluster_name":"configserver_rest","health_check_failure_event":{"failure_type":"ACTIVE","first_check":false},"timestamp":"2019-11-02T02:30:00.777Z"}
{"health_checker_type":"HTTP","host":{"socket_address":{"protocol":"TCP","address":"10.0.18.19","resolver_name":"","ipv4_compat":false,"port_value":9191}},"cluster_name":"uidgenerator_rest","health_check_failure_event":{"failure_type":"ACTIVE","first_check":false},"timestamp":"2019-11-02T02:30:19.857Z"}
Envoy logs:
[2019-11-01 09:48:54.819][000006][info][main] [source/server/server.cc:206] initializing epoch 0 (hot restart version=10.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363 size=2654312)
[2019-11-01 09:48:54.819][000006][info][main] [source/server/server.cc:208] statically linked extensions:
[2019-11-01 09:48:54.819][000006][info][main] [source/server/server.cc:210] access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
[2019-11-01 09:48:54.819][000006][info][main] [source/server/server.cc:213] filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash
[2019-11-01 09:48:54.819][000006][info][main] [source/server/server.cc:216] filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
[2019-11-01 09:48:54.819][000006][info][main] [source/server/server.cc:219] filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.dubbo_proxy,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
[2019-11-01 09:48:54.819][000006][info][main] [source/server/server.cc:221] stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd
[2019-11-01 09:48:54.819][000006][info][main] [source/server/server.cc:223] tracers: envoy.dynamic.ot,envoy.lightstep,envoy.tracers.datadog,envoy.zipkin
[2019-11-01 09:48:54.819][000006][info][main] [source/server/server.cc:226] transport_sockets.downstream: envoy.transport_sockets.alts,envoy.transport_sockets.capture,raw_buffer,tls
[2019-11-01 09:48:54.819][000006][info][main] [source/server/server.cc:229] transport_sockets.upstream: envoy.transport_sockets.alts,envoy.transport_sockets.capture,raw_buffer,tls
[2019-11-01 09:48:54.831][000006][info][main] [source/server/server.cc:271] admin address: 0.0.0.0:9002
[2019-11-01 09:48:54.840][000006][info][config] [source/server/configuration_impl.cc:50] loading 0 static secret(s)
[2019-11-01 09:48:54.840][000006][info][config] [source/server/configuration_impl.cc:56] loading 42 cluster(s)
[2019-11-01 09:48:54.853][000006][info][config] [source/server/configuration_impl.cc:67] loading 2 listener(s)
[2019-11-01 09:48:54.857][000006][info][config] [source/server/configuration_impl.cc:92] loading tracing configuration
[2019-11-01 09:48:54.857][000006][info][config] [source/server/configuration_impl.cc:112] loading stats sink configuration
[2019-11-01 09:48:54.857][000006][info][main] [source/server/server.cc:463] starting main dispatch loop
[2019-11-01 09:49:04.856][000006][info][upstream] [source/common/upstream/cluster_manager_impl.cc:136] cm init: all clusters initialized
[2019-11-01 09:49:04.856][000006][info][main] [source/server/server.cc:435] all clusters initialized. initializing init manager
[2019-11-01 09:49:04.856][000006][info][config] [source/server/listener_manager_impl.cc:961] all dependencies initialized. starting workers
[2019-11-01 10:04:04.857][000006][info][main] [source/server/drain_manager_impl.cc:63] shutting down parent after drain
Should I provide any other helpful information?Thank you! ^_^
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
Although the envoy's health check never worked, the url access was available.So I came up with another way that Prometheus could use the gauge type metrics I wrote to capture the health of the service.Therefore, the metrics interface provided by envoy is no longer important.Thank you very much!
@Firewall-Tomohisa what if trying a newer version of envoy? Seems like you tried envoy-alpine:v1.9.0
.
Ok, thanks for your advice.And I will try.^_^
Hello! ^_^ I have changed the type from LOGICAL_DNS to STRICT_DNS,but http_health_check still doesn't work.
The image version: envoy-alpine:v1.9.0
Config:
Log contents:
Enter the envoy container and execute the curl command: / # curl -I:/health
HTTP/1.1 200
Content-Type: <>
Transfer-Encoding: chunked
Date: Fri, 01 Nov 2019 06:14:30 GMT
Grafana graph:
From the graph, the health check seemed to work only for a split second after I deployed the service.
Metrics: envoy_cluster_health_check_healthy{instance="ip:port"}
Prometheus graph:
So, I wonder if this is due to configuration or something else.I had looked at the envoy's official profile of the health check configuration and tried the Google, but couldn't find a solution.
Now, this may be my last hope.Thank you very much!