Closed jmuia closed 3 years ago
Is there any more detail you can share about your use of Lua or any of the times that it crashes? Do you use LDS to deliver Lua config? Nothing immediately comes to mind from the core dump. cc @dio
ACK, more info on this will be very helpful.
@mattklein123 @dio I see a similar crash due to SDS problem (when certs are missing), may be related to this? https://github.com/envoyproxy/envoy/issues/9765#issuecomment-593965601
@jmuia do you have listeners with tls context configured?
Yes I think it's possible that the crash is due to listener teardown issues. We just fixed something similar in a different filter (cc @yuval-k), but it would be nice to confirm that is what is happening.
Is there any more detail you can share about your use of Lua or any of the times that it crashes?
Uses of Lua include:
Timing of crashes: I need to collect more data here, but the couple of crashes that I've spot checked so far have happened upon a cold start.
...
[2020-03-04 01:38:32.457][2678][info][config] [external/envoy/source/server/listener_manager_impl.cc:707] all dependencies initialized. starting workers
[2020-03-04 01:38:32.479][3604][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:83] Caught Segmentation fault, suspect faulting address 0x0
Do you use LDS to deliver Lua config?
Yep, we use LDS for this.
Nothing immediately comes to mind from the core dump.
If it would be helpful I can provide the backtrace for all threads from the core dump.
do you have listeners with tls context configured?
Yep, the listener that is configured with a Lua filter also configures tls_context.
--
I've attached below the (sanitized) log output prior to a crash:
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:251] initializing epoch 0 (hot restart version=11.104)
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:253] statically linked extensions:
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.transport_sockets.downstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.filters.http: envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.dynamic_forward_proxy, envoy.filters.http.grpc_http1_reverse_bridge, envoy.filters.http.grpc_stats, envoy.filters.http.header_to_metadata, envoy.filters.http.jwt_authn, envoy.filters.http.on_demand, envoy.filters.http.original_src, envoy.filters.http.rbac, envoy.filters.http.tap, envoy.grpc_http1_bridge, envoy.grpc_json_transcoder, envoy.grpc_web, envoy.gzip, envoy.health_check, envoy.http_dynamo_filter, envoy.ip_tagging, envoy.lua, envoy.rate_limit, envoy.router, envoy.squash
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.thrift_proxy.filters: envoy.filters.thrift.rate_limit, envoy.filters.thrift.router
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.dubbo_proxy.protocols: dubbo
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.filters.network: envoy.client_ssl_auth, envoy.echo, envoy.ext_authz, envoy.filters.network.dubbo_proxy, envoy.filters.network.kafka_broker, envoy.filters.network.local_ratelimit, envoy.filters.network.mysql_proxy, envoy.filters.network.rbac, envoy.filters.network.sni_cluster, envoy.filters.network.thrift_proxy, envoy.filters.network.zookeeper_proxy, envoy.http_connection_manager, envoy.mongo_proxy, envoy.ratelimit, envoy.redis_proxy, envoy.tcp_proxy
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.health_checkers: envoy.health_checkers.redis
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.thrift_proxy.protocols: auto, binary, binary/non-strict, compact, twitter
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.access_loggers: envoy.file_access_log, envoy.http_grpc_access_log, envoy.tcp_grpc_access_log
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.retry_priorities: envoy.retry_priorities.previous_priorities
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.resolvers: envoy.ip
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.dubbo_proxy.route_matchers: default
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.filters.udp_listener: envoy.filters.udp_listener.udp_proxy
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.resource_monitors: envoy.resource_monitors.fixed_heap, envoy.resource_monitors.injected_resource
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.udp_listeners: raw_udp_listener
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.filters.listener: envoy.listener.http_inspector, envoy.listener.original_dst, envoy.listener.original_src, envoy.listener.proxy_protocol, envoy.listener.tls_inspector
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.dubbo_proxy.filters: envoy.filters.dubbo.router
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.retry_host_predicates: envoy.retry_host_predicates.omit_canary_hosts, envoy.retry_host_predicates.previous_hosts
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.dubbo_proxy.serializers: dubbo.hessian2
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.thrift_proxy.transports: auto, framed, header, unframed
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.stats_sinks: envoy.dog_statsd, envoy.metrics_service, envoy.stat_sinks.hystrix, envoy.statsd
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.tracers: envoy.dynamic.ot, envoy.lightstep, envoy.tracers.datadog, envoy.tracers.opencensus, envoy.tracers.xray, envoy.zipkin
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.clusters: envoy.cluster.eds, envoy.cluster.logical_dns, envoy.cluster.original_dst, envoy.cluster.static, envoy.cluster.strict_dns, envoy.clusters.aggregate, envoy.clusters.dynamic_forward_proxy, envoy.clusters.redis
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.transport_sockets.upstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255] envoy.grpc_credentials: envoy.grpc_credentials.aws_iam, envoy.grpc_credentials.default, envoy.grpc_credentials.file_based_metadata
[2020-03-04 01:38:26.069][2678][info][main] [external/envoy/source/server/server.cc:336] admin address: IP:PORT
[2020-03-04 01:38:26.078][2678][info][main] [external/envoy/source/server/server.cc:455] runtime: layers:
- name: static_layer
static_layer:
envoy.deprecated_features:envoy.api.v2.Cluster.tls_context: true
[2020-03-04 01:38:26.082][2678][info][config] [external/envoy/source/server/configuration_impl.cc:62] loading 0 static secret(s)
[2020-03-04 01:38:26.082][2678][info][config] [external/envoy/source/server/configuration_impl.cc:68] loading 5 cluster(s)
[2020-03-04 01:38:26.130][2678][info][config] [external/envoy/source/server/configuration_impl.cc:72] loading 0 listener(s)
[2020-03-04 01:38:26.133][2678][info][config] [external/envoy/source/server/configuration_impl.cc:97] loading tracing configuration
[2020-03-04 01:38:26.133][2678][info][config] [external/envoy/source/server/configuration_impl.cc:116] loading stats sink configuration
[2020-03-04 01:38:26.135][2678][info][main] [external/envoy/source/server/server.cc:550] starting main dispatch loop
[2020-03-04 01:38:26.189][2678][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:167] cm init: initializing cds
[2020-03-04 01:38:26.369][2678][info][upstream] [external/envoy/source/common/upstream/cds_api_impl.cc:74] cds: add 72 cluster(s), remove 5 cluster(s)
[2020-03-04 01:38:26.418][2678][info][upstream] [external/envoy/source/common/upstream/cds_api_impl.cc:90] cds: add/update cluster 'a-cluster'
<truncated for brevity cds: add/update cluster ....>
[2020-03-04 01:38:26.418][2678][info][upstream] [external/envoy/source/common/upstream/cds_api_impl.cc:90] cds: add/update cluster 'some-cluster'
[2020-03-04 01:38:28.352][2678][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:171] cm init: all clusters initialized
[2020-03-04 01:38:28.365][2678][info][main] [external/envoy/source/server/server.cc:529] all clusters initialized. initializing init manager
[2020-03-04 01:38:30.777][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'a-listener'
[2020-03-04 01:38:30.779][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'other-listener'
[2020-03-04 01:38:31.192][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.194][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.258][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.259][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.272][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'listener-with-lua-and-tls-context'
[2020-03-04 01:38:31.288][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'another-listener'
[2020-03-04 01:38:31.293][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.294][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'another-listener-2'
[2020-03-04 01:38:31.297][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.298][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'another-listener-3'
[2020-03-04 01:38:31.299][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'another-listener-4'
[2020-03-04 01:38:31.300][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'another-listener-5'
[2020-03-04 01:38:32.069][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:32.070][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:32.120][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:32.120][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:32.137][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'listener-with-lua-and-tls-context'
[2020-03-04 01:38:32.457][2678][info][config] [external/envoy/source/server/listener_manager_impl.cc:707] all dependencies initialized. starting workers
[2020-03-04 01:38:32.479][3604][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:83] Caught Segmentation fault, suspect faulting address 0x0
Possibly related: https://github.com/envoyproxy/envoy/issues/9940
looking at the code and stack trace i imagine that the issue is capturing this
in this line:
https://github.com/envoyproxy/envoy/blob/eb894d9d0e5373469286c9fa1c91045139176a1e/source/extensions/filters/common/lua/lua.h#L390
an quick solution would be to pass tls_slot_
by copy instead of this
do note that note above is just from reviewing code, I didn't test this.
The reason is that when the worker executes registerType
, the ThreadLocalState object captured by the callback function has been destructed. I encountered this problem when trying to implement routing granularity Lua scripts #11235. This problem is triggered when you create a ThreadLocalState object during initialization and immediately destruct it. Your Listener listener-with-lua-and-tls-context
was updated (but why?) during the initialization phase, which means that the old listener configuration was destructed, so this problem was triggered.
[2020-03-04 01:38:31.272][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'listener-with-lua-and-tls-context'
.....................
[2020-03-04 01:38:32.137][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'listener-with-lua-and-tls-context'
@jmuia You can check to see if this problem has been fixed by #11944 . If it has been fixed, we can close this issue. If this issue still exists, I hope you can provide more information, such as how to reproduce it.
Thanks @wbpcode.
The issue was spurious and we never found a repro. I'm not sure yet what conditions changed in our environment, but we stopped triggering this segfault without deploying https://github.com/envoyproxy/envoy/pull/11944 internally.
I'll close this issue for now since you've merged https://github.com/envoyproxy/envoy/pull/11944 and will re-open if it re-surfaces.
Title: Segmentation faults in Lua filter (
LuaThreadLocal
)Description: Envoy occasionally crashes, which is unexpected.
Repro steps: Unsure how to reproduce at the moment, it seems spurious (<5 times per day in a QA fleet since collecting core dumps).
It has been observed in:
1.12.2
+ #9121 + #9509 + patches from1.12.3
1.13.1
We haven't collected core dumps from other versions.
Config: We have various Lua filters configured.
Call Stack: From the
1.12.2
build described above: