Closed charlesxu2000 closed 3 years ago
It is hard to tell what is the problem with the information provided. The best approach is to collect a core dump and get a stack trace of where Envoy has stopped.
cc @ggreenway
would be good to get more information including what is trying to get entropy but not finding enough, maybe SSL libraries?
I'd never seen that ioctl before and didn't notice that it was crypto related. It looks like you're in this fips-only boringssl code:
#if defined(BORINGSSL_FIPS) && !defined(URANDOM_BLOCKS_FOR_ENTROPY)
// In FIPS mode on platforms where urandom doesn't block at startup, we ensure
// that the kernel has sufficient entropy before continuing. This is
// automatically handled by getrandom, which requires that the entropy pool
// has been initialised, but for urandom we have to poll.
for (;;) {
int entropy_bits;
if (ioctl(fd, RNDGETENTCNT, &entropy_bits)) {
fprintf(stderr,
"RNDGETENTCNT on /dev/urandom failed. We cannot continue in this "
"case when in FIPS mode.\n");
abort();
}
static const int kBitsNeeded = 256;
if (entropy_bits >= kBitsNeeded) {
break;
}
usleep(250000);
}
#endif // BORINGSSL_FIPS && !URANDOM_BLOCKS_FOR_ENTROPY
}
Out of curiosity, who is your CPU vendor? Are you running on a VM or on bare metal? Does the machine have spinning disks or only SSDs?
It is FIPS enabled Envoy, we run on ESXi. It is most likely due to the low available entropy number.
The root cause for this issue is due to the number of available bits for entropy is low. Envoy boring SSL requires at least 256 bits. In our VM, it is below that. By increase the available entropy number, Envoy comes up.
Title: Envoy process stuck at initialization stage
Description:
By using
strace
tool attached to envoy process, it looks like looping at get entroy. Here is the strace output:With the Envoy trace on, it seems it stuck at https://github.com/envoyproxy/envoy/blob/release/v1.13/source/server/configuration_impl.cc#L75
Envoy version: b67c14052c49890a7e3afe614d50979c346c024b/1.13.1/Clean/RELEASE/BoringSSL-FIPS
But I tried 1.17.1, it is the same result.
Linux version is: Linux 3.10.62-ltsi #1 SMP PREEMPT Sat May 1 03:03:01 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Repro steps:
Logs:
[2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:251] initializing epoch 0 (hot restart version=11.104) [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:253] statically linked extensions: [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.thrift_proxy.filters: envoy.filters.thrift.rate_limit, envoy.filters.thrift.router [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.transport_sockets.upstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.grpc_credentials: envoy.grpc_credentials.aws_iam, envoy.grpc_credentials.default, envoy.grpc_credentials.file_based_metadata [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.clusters: envoy.cluster.eds, envoy.cluster.logical_dns, envoy.cluster.original_dst, envoy.cluster.static, envoy.cluster.strict_dns, envoy.clusters.aggregate, envoy.clusters.dynamic_forward_proxy, envoy.clusters.redis [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.dubbo_proxy.serializers: dubbo.hessian2 [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.resource_monitors: envoy.resource_monitors.fixed_heap, envoy.resource_monitors.injected_resource [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.stats_sinks: envoy.dog_statsd, envoy.metrics_service, envoy.stat_sinks.hystrix, envoy.statsd [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.resolvers: envoy.ip [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.transport_sockets.downstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.health_checkers: envoy.health_checkers.redis [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.dubbo_proxy.route_matchers: default [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.retry_host_predicates: envoy.retry_host_predicates.omit_canary_hosts, envoy.retry_host_predicates.previous_hosts [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.filters.listener: envoy.listener.http_inspector, envoy.listener.original_dst, envoy.listener.original_src, envoy.listener.proxy_protocol, envoy.listener.tls_inspector [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.thrift_proxy.transports: auto, framed, header, unframed [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.tracers: envoy.dynamic.ot, envoy.lightstep, envoy.tracers.datadog, envoy.tracers.opencensus, envoy.tracers.xray, envoy.zipkin [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.filters.udp_listener: envoy.filters.udp_listener.udp_proxy [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.access_loggers: envoy.file_access_log, envoy.http_grpc_access_log, envoy.tcp_grpc_access_log [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.retry_priorities: envoy.retry_priorities.previous_priorities [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.dubbo_proxy.protocols: dubbo [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.udp_listeners: raw_udp_listener [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.dubbo_proxy.filters: envoy.filters.dubbo.router [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.filters.http: envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.dynamic_forward_proxy, envoy.filters.http.grpc_http1_reverse_bridge, envoy.filters.http.grpc_stats, envoy.filters.http.header_to_metadata, envoy.filters.http.jwt_authn, envoy.filters.http.on_demand, envoy.filters.http.original_src, envoy.filters.http.rbac, envoy.filters.http.tap, envoy.grpc_http1_bridge, envoy.grpc_json_transcoder, envoy.grpc_web, envoy.gzip, envoy.health_check, envoy.http_dynamo_filter, envoy.ip_tagging, envoy.lua, envoy.rate_limit, envoy.router, envoy.squash [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.thrift_proxy.protocols: auto, binary, binary/non-strict, compact, twitter [2021-05-02 03:07:30.458][13820][info][main] [source/server/server.cc:255] envoy.filters.network: envoy.client_ssl_auth, envoy.echo, envoy.ext_authz, envoy.filters.network.dubbo_proxy, envoy.filters.network.kafka_broker, envoy.filters.network.local_ratelimit, envoy.filters.network.mysql_proxy, envoy.filters.network.rbac, envoy.filters.network.sni_cluster, envoy.filters.network.thrift_proxy, envoy.filters.network.zookeeper_proxy, envoy.http_connection_manager, envoy.mongo_proxy, envoy.ratelimit, envoy.redis_proxy, envoy.tcp_proxy [2021-05-02 03:07:30.464][13820][warning][misc] [source/common/protobuf/utility.cc:441] Using deprecated option 'envoy.api.v2.listener.FilterChain.tls_context' from file listener_components.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details. [2021-05-02 03:07:30.464][13820][warning][misc] [source/common/protobuf/utility.cc:441] Using deprecated option 'envoy.api.v2.Cluster.tls_context' from file cluster.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details. [2021-05-02 03:07:30.464][13820][warning][main] [source/server/server.cc:342] No admin address given, so no admin HTTP server started. [2021-05-02 03:07:30.464][13820][debug][main] [source/server/overload_manager_impl.cc:184] No overload action is configured for envoy.overload_actions.shrink_heap. [2021-05-02 03:07:30.464][13820][debug][main] [source/server/overload_manager_impl.cc:184] No overload action is configured for envoy.overload_actions.stop_accepting_connections. [2021-05-02 03:07:30.464][13820][debug][main] [source/server/overload_manager_impl.cc:184] No overload action is configured for envoy.overload_actions.stop_accepting_connections. [2021-05-02 03:07:30.464][13820][debug][main] [source/server/overload_manager_impl.cc:184] No overload action is configured for envoy.overload_actions.stop_accepting_connections. [2021-05-02 03:07:30.464][13820][debug][main] [source/server/overload_manager_impl.cc:184] No overload action is configured for envoy.overload_actions.stop_accepting_connections. [2021-05-02 03:07:30.464][13820][debug][main] [source/server/overload_manager_impl.cc:184] No overload action is configured for envoy.overload_actions.stop_accepting_connections. [2021-05-02 03:07:30.464][13820][debug][main] [source/server/overload_manager_impl.cc:184] No overload action is configured for envoy.overload_actions.stop_accepting_connections. [2021-05-02 03:07:30.464][13820][debug][main] [source/server/overload_manager_impl.cc:184] No overload action is configured for envoy.overload_actions.stop_accepting_connections. [2021-05-02 03:07:30.464][13820][debug][main] [source/server/overload_manager_impl.cc:184] No overload action is configured for envoy.overload_actions.stop_accepting_connections. [2021-05-02 03:07:30.464][13820][info][main] [source/server/server.cc:455] runtime: layers: