Closed thelounge-zz closed 3 years ago
@thelounge-zz Did you see this issue 8.0.1? How much did you run 8.0.1?
I am seeing the same issue with 8.0.3:
(gdb) bt full
#0 0x00002aaaadd121d7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00002aaaadd138c8 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00002aaaaad29a3b in ink_abort (message_format=message_format@entry=0x2aaaaad906a7 "%s:%d: failed assertion `%s`") at ink_error.cc:99
ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x2aaab39fe570, reg_save_area = 0x2aaab39fe4b0}}
#3 0x00002aaaaad27525 in _ink_assert (expression=expression@entry=0x735706 "magic == HTTP_SM_MAGIC_ALIVE", file=file@entry=0x7350d6 "HttpSM.cc", line=line@entry=2516) at ink_assert.cc:37
No locals.
#4 0x000000000050f767 in HttpSM::main_handler (this=<optimized out>, event=<optimized out>, data=<optimized out>) at HttpSM.cc:2516
__FUNCTION__ = "main_handler"
#5 0x0000000000557969 in handleEvent (data=0x2aabe1f4c4f8, event=2301, this=<optimized out>) at /home/bcall/dev/yahoo/build/_build/build_debug_posix-x86_64_gcc_8/trafficserver8/iocore/eventsystem/I_Continuation.h:160
No locals.
#6 HttpTunnel::main_handler (this=0x2aabe1f4c4f8, event=<optimized out>, data=<optimized out>) at HttpTunnel.cc:1643
p = <optimized out>
c = <optimized out>
sm_callback = <optimized out>
#7 0x000000000071a4b2 in handleEvent (data=0x2aac1dcbd620, event=103, this=<optimized out>) at I_Continuation.h:160
Did you see this issue 8.0.1?
no
How much did you run 8.0.1?
i run it all time since the "ab -c 50 -n 100000 -k" https issue don#t seem to be that relevant in real life but it's anyways bad and had to downgrade from 8.0.2 to 8.0.1 because this issue is not acceptable in production
Can you please print HttpSM::history ?
@scw00
(gdb) frame 6
#6 HttpTunnel::main_handler (this=0x2aac4307cf58, event=<optimized out>, data=<optimized out>) at HttpTunnel.cc:1643
1643 sm->handleEvent(HTTP_TUNNEL_EVENT_DONE, this);
(gdb) p sm->history
$6 = {history = {{location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282},
event = 60000, reentrancy = 2}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout",
line = 1322}, event = 60000, reentrancy = 2}, {location = {file = 0x7350d6 "HttpSM.cc",
func = 0x738f40 <HttpSM::state_read_client_request_header(int, void*)::__FUNCTION__> "state_read_client_request_header", line = 578}, event = 100, reentrancy = 3}, {
location = {file = 0x7350d6 "HttpSM.cc", func = 0x738f40 <HttpSM::state_read_client_request_header(int, void*)::__FUNCTION__> "state_read_client_request_header",
line = 578}, event = 100, reentrancy = 1}, {location = {file = 0x7350d6 "HttpSM.cc",
func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 2}, {location = {
file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 2}, {
location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000,
reentrancy = 3}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322},
event = 60000, reentrancy = 3}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback",
line = 1282}, event = 60000, reentrancy = 4}, {location = {file = 0x7350d6 "HttpSM.cc",
func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 4}, {location = {
file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 5}, {
location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000,
reentrancy = 5}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282},
event = 60000, reentrancy = 6}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout",
line = 1322}, event = 60000, reentrancy = 6}, {location = {file = 0x7350d6 "HttpSM.cc",
func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 7}, {location = {
file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 7}, {
location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000,
reentrancy = 8}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322},
event = 60000, reentrancy = 8}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback",
line = 1282}, event = 60000, reentrancy = 9}, {location = {file = 0x7350d6 "HttpSM.cc",
func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 9}, {location = {
file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 10}, {
location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000,
reentrancy = 10}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282},
event = 60000, reentrancy = 11}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout",
line = 1322}, event = 60000, reentrancy = 11}, {location = {file = 0x7350d6 "HttpSM.cc",
func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 12}, {location = {
file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 12}, {
location = {file = 0x7350d6 "HttpSM.cc", func = 0x738358 <HttpSM::set_next_state()::__FUNCTION__> "set_next_state", line = 7309}, event = 34463, reentrancy = 12}, {
location = {file = 0x743cfd "HttpCacheSM.cc", func = 0x743e90 <HttpCacheSM::state_cache_open_read(int, void*)::__FUNCTION__> "state_cache_open_read", line = 116},
event = 1102, reentrancy = -31073}, {location = {file = 0x7350d6 "HttpSM.cc",
func = 0x738be0 <HttpSM::state_cache_open_read(int, void*)::__FUNCTION__> "state_cache_open_read", line = 2455}, event = 1102, reentrancy = 13}, {location = {
file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 14}, {
location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000,
reentrancy = 14}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282},
event = 60000, reentrancy = 15}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout",
line = 1322}, event = 60000, reentrancy = 15}, {location = {file = 0x7350d6 "HttpSM.cc",
func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 16}, {location = {
file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 16}, {
location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000,
reentrancy = 17}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322},
event = 60000, reentrancy = 17}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback",
line = 1282}, event = 60000, reentrancy = 18}, {location = {file = 0x7350d6 "HttpSM.cc",
func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 18}, {location = {
file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 19}, {
location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000,
reentrancy = 19}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x7385d0 <HttpSM::setup_cache_read_transfer()::__FUNCTION__> "setup_cache_read_transfer", line = 6065},
event = 34463, reentrancy = 19}, {location = {file = 0x7350d6 "HttpSM.cc",
func = 0x738a60 <HttpSM::tunnel_handler_cache_read(int, HttpTunnelProducer*)::__FUNCTION__> "tunnel_handler_cache_read", line = 3299}, event = 102, reentrancy = 12}, {
location = {file = 0x7350d6 "HttpSM.cc", func = 0x738aa0 <HttpSM::tunnel_handler_ua(int, HttpTunnelConsumer*)::__FUNCTION__> "tunnel_handler_ua", line = 3125}, event = 103,
reentrancy = 12}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738b40 <HttpSM::tunnel_handler(int, void*)::__FUNCTION__> "tunnel_handler", line = 2810}, event = 2301,
reentrancy = 13}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x7387f0 <HttpSM::do_cache_lookup_and_read()::__FUNCTION__> "do_cache_lookup_and_read", line = 4507},
event = 0, reentrancy = 12}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback",
line = 1282}, event = 60000, reentrancy = 2}, {location = {file = 0x7350d6 "HttpSM.cc",
func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 2}, {location = {
file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 3}, {
location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000,
reentrancy = 3}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282},
event = 60000, reentrancy = 4}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout",
line = 1322}, event = 60000, reentrancy = 4}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x7383e8 <HttpSM::kill_this()::__FUNCTION__> "kill_this", line = 6856},
event = 34463, reentrancy = 0}, {location = {file = 0x0, func = 0x0, line = 0}, event = 0, reentrancy = 0} <repeats 12 times>}, history_pos = 53}
@thelounge-zz
Could you please backport PR #4379 and test again ?
There is an assert within HttpTunnel::main_handler
:
int
HttpTunnel::main_handler(int event, void *data)
{
HttpTunnelProducer *p = nullptr;
HttpTunnelConsumer *c = nullptr;
bool sm_callback = false;
++reentrancy_count;
ink_assert(sm->magic == HTTP_SM_MAGIC_ALIVE);
and HttpSM::main_handler
have its own assert:
int
HttpSM::main_handler(int event, void *data)
{
ink_release_assert(magic == HTTP_SM_MAGIC_ALIVE);
According to the backtrace:
(gdb) bt full
#0 0x00002aaaadd121d7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00002aaaadd138c8 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00002aaaaad29a3b in ink_abort (message_format=message_format@entry=0x2aaaaad906a7 "%s:%d: failed assertion `%s`") at ink_error.cc:99
ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x2aaab39fe570, reg_save_area = 0x2aaab39fe4b0}}
#3 0x00002aaaaad27525 in _ink_assert (expression=expression@entry=0x735706 "magic == HTTP_SM_MAGIC_ALIVE", file=file@entry=0x7350d6 "HttpSM.cc", line=line@entry=2516) at ink_assert.cc:37
No locals.
#4 0x000000000050f767 in HttpSM::main_handler (this=, event=, data=) at HttpSM.cc:2516
**FUNCTION** = "main_handler"
#5 0x0000000000557969 in handleEvent (data=0x2aabe1f4c4f8, event=2301, this=) at /home/bcall/dev/yahoo/build/_build/build_debug_posix-x86_64_gcc_8/trafficserver8/iocore/eventsystem/I_Continuation.h:160
No locals.
#6 HttpTunnel::main_handler (this=0x2aabe1f4c4f8, event=, data=) at HttpTunnel.cc:1643
p =
c =
sm_callback =
#7 0x000000000071a4b2 in handleEvent (data=0x2aac1dcbd620, event=103, this=) at I_Continuation.h:160
The ATS should assert at frame 6th if debug enabled.
@bryancall Do you have debug enabled ?
@oknet no i can't backport anything, i work with tarballs and rpmbuild and have no business in C/C++ at all
@oknet I am not running a debug build or else it would have asserted in the tunnel. It looks like the HttpSM has been killed and there are still some events happening in the tunnel.
@oknet I will try a test with #4379 and see what happens. This bug is very reproducible with me and happens within the first 1000 requests of the server starting.
Updated formatting on logs/backtraces for legibility.
And for the record, we've now seen this crash a few times.
is this now fixed or not? https://raw.githubusercontent.com/apache/trafficserver/8.0.x/CHANGELOG-8.0.3
frankly it's not funny go ahead and find it out
it's not
Mar 23 16:13:47 proxy traffic_manager[371818]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE
Mar 23 16:13:47 proxy traffic_manager[371818]: traffic_server: received signal 6 (Aborted)
Mar 23 16:13:47 proxy traffic_manager[371818]: traffic_server - STACK TRACE:
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa0)[0x55b9a165c9f3]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/libpthread.so.0(+0x12080)[0x7fde995e2080]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(gsignal+0x10b)[0x7fde99247eab]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(abort+0x123)[0x7fde992325b9]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/trafficserver/libtscore.so.8(_Z17ink_mutex_destroyP15pthread_mutex_t+0x0)[0x7fde9bb13ee6]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/trafficserver/libtscore.so.8(ink_freelist_init_ops+0x0)[0x7fde9bb14e60]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x3b)[0x55b9a1636c91]
Mar 23 16:13:47 proxy traffic_server[371826]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xd6)[0x55b9a1608050]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x55b9a15441e5]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9LinklinkEEPiS5+0x62)[0x55b9a15442ca]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x161)[0x55b9a1544517]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(+0x94fa0)[0x55b9a1542fa0]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/libpthread.so.0(+0x7594)[0x7fde995d7594]
Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(clone+0x3f)[0x7fde9930af4f]
Mar 23 16:13:50 proxy traffic_server[372230]: NOTE: --- traffic_server Starting ---
Mar 23 16:13:50 proxy traffic_server[372230]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 8.0.3 - (build # 032315 on Mar 23 2019 at 15:46:18)
Mar 23 16:13:50 proxy traffic_server[372230]: NOTE: RLIMIT_NOFILE(7):cur(675000),max(675000)
Mar 23 16:41:23 proxy traffic_manager[371818]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE
Mar 23 16:41:23 proxy traffic_manager[371818]: traffic_server: received signal 6 (Aborted)
Mar 23 16:41:23 proxy traffic_manager[371818]: traffic_server - STACK TRACE:
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa0)[0x5587dd4049f3]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/libpthread.so.0(+0x12080)[0x7fae28cf1080]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(gsignal+0x10b)[0x7fae28956eab]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(abort+0x123)[0x7fae289415b9]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/trafficserver/libtscore.so.8(_Z17ink_mutex_destroyP15pthread_mutex_t+0x0)[0x7fae2b222ee6]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/trafficserver/libtscore.so.8(ink_freelist_init_ops+0x0)[0x7fae2b223e60]
Mar 23 16:41:23 proxy traffic_server[372230]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x3b)[0x5587dd3dec91]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xd6)[0x5587dd3b0050]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x5587dd2ec1e5]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9LinklinkEEPiS5+0x62)[0x5587dd2ec2ca]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x161)[0x5587dd2ec517]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(+0x94fa0)[0x5587dd2eafa0]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/libpthread.so.0(+0x7594)[0x7fae28ce6594]
Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(clone+0x3f)[0x7fae28a19f4f]
Mar 23 16:41:26 proxy traffic_server[372449]: NOTE: --- traffic_server Starting ---
Mar 23 16:41:26 proxy traffic_server[372449]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 8.0.3 - (build # 032315 on Mar 23 2019 at 15:46:18)
Mar 23 16:41:26 proxy traffic_server[372449]: NOTE: RLIMIT_NOFILE(7):cur(675000),max(675000)
Mar 23 16:45:05 proxy traffic_manager[371818]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE
@thelounge-zz is this reproducible with ab?
nope, happens randomly but way too often for a production system and i am not able to trigger it as i want for a clear "here look" reproducer
@thelounge-zz do you have h2 enabled?
surely, it's default all the time as well as we use dualstack certificates for every domain (aka RSA + EC certificates to support older clients as well as efficient encryption for modern ones)
https://github.com/apache/trafficserver/blob/8.0.x/CHANGELOG-8.0.4 still don't mention that issue which is not funny given how long 8.0.1 without that issue is out there and how many bugs are fixed since then
and as expected the same shit with 8.0.4 after less the 15 minutes uptime, downgrading again to 8.0.1 with it's horrible tls-keep-alive bugs when use "ab -k" but at least not crashing all day long
Aug 14 14:02:12 proxy traffic_manager[494641]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE
Aug 14 14:02:12 proxy traffic_manager[494641]: traffic_server: received signal 6 (Aborted)
Aug 14 14:02:12 proxy traffic_manager[494641]: traffic_server - STACK TRACE:
Aug 14 14:02:12 proxy traffic_server[494647]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa0)[0x56163969fa2c]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/libpthread.so.0(+0x13070)[0x7f4fcff36070]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/libc.so.6(gsignal+0x10f)[0x7f4fcfd9557f]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/libc.so.6(abort+0x127)[0x7f4fcfd7f895]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/trafficserver/libtscore.so.8(_Z17ink_mutex_destroyP15pthread_mutex_t+0x0)[0x7f4fd0aeff6a]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/trafficserver/libtscore.so.8(ink_freelist_init_ops+0x0)[0x7f4fd0af0ee4]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x3b)[0x561639676f81]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xd6)[0x5616396448c0]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x56163957fcd5]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9LinklinkEEPiS5+0x62)[0x56163957fdba]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x161)[0x561639580007]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(+0x95a90)[0x56163957ea90]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/libpthread.so.0(+0x858e)[0x7f4fcff2b58e]
Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/libc.so.6(clone+0x43)[0x7f4fcfe5a713]
Aug 14 14:02:15 proxy traffic_server[494918]: NOTE: --- traffic_server Starting ---
Aug 14 14:02:15 proxy traffic_server[494918]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 8.0.4 - (build # 081413 on Aug 14 2019 at 13:46:42)
Aug 14 14:02:15 proxy traffic_server[494918]: NOTE: RLIMIT_NOFILE(7):cur(675000),max(675000)
The problem is that no one else is seeing this crasher, at least not that I know of. Does you setup include any plugins? Any other information how to reproduce this, other than "ab" ?
it's not reproduceable with ab, it crashs in production with a very low load
what i don't understand it that it even show the file and line where it happens (HttpSM.cc:2516) and given that it don't crash with 8.0.1 the offending changes between 8.0.1 and 8.0.2 shouldn't be that much in this area
[root@proxy:/etc/trafficserver]$ cat plugin.config compress.so /etc/trafficserver/compress.config header_rewrite.so /etc/trafficserver/header_rewrite.config
[root@proxy:/etc/trafficserver]$ cat /etc/trafficserver/compress.config enabled true cache true flush true remove-accept-encoding true supported-algorithms br,gzip,deflate compressible-content-type text/* compressible-content-type application/javascript compressible-content-type application/json compressible-content-type application/msaccess compressible-content-type application/msexcel compressible-content-type application/mshelp compressible-content-type application/mspowerpoint compressible-content-type application/msword compressible-content-type application/pdf compressible-content-type application/postscript compressible-content-type application/rss+xml compressible-content-type application/vnd.ms-excel.addin.macroEnabled.12 compressible-content-type application/vnd.ms-excel.sheet.binary.macroEnabled.12 compressible-content-type application/vnd.ms-excel.sheet.macroEnabled.12 compressible-content-type application/vnd.ms-excel.template.macroEnabled.12 compressible-content-type application/vnd.ms-fontobject compressible-content-type application/vnd.ms-powerpoint.addin.macroEnabled.12 compressible-content-type application/vnd.ms-powerpoint.presentation.macroEnabled.12 compressible-content-type application/vnd.ms-powerpoint.slideshow.macroEnabled.12 compressible-content-type application/vnd.ms-powerpoint.template.macroEnabled.12 compressible-content-type application/vnd.ms-word.document.macroEnabled.12 compressible-content-type application/vnd.ms-word.template.macroEnabled.12 compressible-content-type application/vnd.wap.wmlc compressible-content-type application/x-httpd-php-source compressible-content-type application/x-javascript compressible-content-type application/x-latex compressible-content-type application/x-sh compressible-content-type application/xhtml+xml compressible-content-type application/xml compressible-content-type application/xml+rss compressible-content-type audio/x-wav compressible-content-type font/otf compressible-content-type font/ttf compressible-content-type image/svg+xml
[root@proxy:/etc/trafficserver]$ cat /etc/trafficserver/header_rewrite.config cond %{SEND_REQUEST_HDR_HOOK} cond %{INCOMING-PORT}=443 set-header X-TL-TLS-Offload 1 set-header X-TL-TLS-Offload-Port %{INCOMING-PORT} cond %{SEND_REQUEST_HDR_HOOK} cond %{INCOMING-PORT}=8443 set-header X-TL-TLS-Offload 1 set-header X-TL-TLS-Offload-Port %{INCOMING-PORT}
@zwoop We saw it a few times, but it's very very rare. Also, it looked like @bryancall saw it as well.
The changes between 8.0.1 and 8.0.2 are relatively minimal:
git diff --stat 8.0.1 8.0.2
CHANGELOG-8.0.2 | 12 ++
STATUS | 8 +-
configure.ac | 4 +-
doc/Makefile.am | 2 +-
doc/admin-guide/introduction.en.rst | 154 ++++++++++-----------
doc/admin-guide/plugins/balancer.en.rst | 4 +
doc/admin-guide/plugins/sslheaders.en.rst | 2 +-
doc/ext/local-config.py.in | 13 +-
doc/{ => ext}/plantuml_fetch.sh | 0
doc/ext/traffic-server.py | 12 +-
doc/uml/Makefile.am | 2 +-
iocore/eventsystem/I_VConnection.h | 2 +-
iocore/eventsystem/P_UnixEventProcessor.h | 7 +-
plugins/experimental/sslheaders/expand.cc | 2 +-
plugins/experimental/sslheaders/sslheaders.cc | 82 ++++++++---
plugins/experimental/sslheaders/sslheaders.h | 8 +-
proxy/ProxyClientSession.cc | 3 +-
proxy/http/HttpSM.cc | 7 +-
tests/gold_tests/pluginTest/sslheaders/observer.py | 31 +++++
.../pluginTest/sslheaders/ssl/server.key | 28 ++++
.../pluginTest/sslheaders/ssl/server.pem | 21 +++
.../pluginTest/sslheaders/sslheaders.gold | 1 +
.../pluginTest/sslheaders/sslheaders.test.py | 91 ++++++++++++
tools/package/trafficserver.spec | 2 +-
24 files changed, 376 insertions(+), 122 deletions(-)
+Changes with Apache Traffic Server 8.0.2
+ #3601 - Add TLSv1.3 cipher suites for OpenSSL-1.1.1
+ #3927 - cppapi :InterceptPlugin fix
+ #3939 - Make sure the index stays positive
+ #4217 - Fix a regression in the traffic_ctl host status subcommand.
+ #4369 - Doc: Fix doc build to work with Sphinx 1.8.
+ #4541 - Doc minor fixes.
+ #4700 - sslheaders experimental plugin: fix doc typo, improve container use.
+ #4701 - Make sslheaders plugin better conform to documentation.
+ #4716 - Revert "Two more places to check whether attempting half_closed connecton logic is feasible"
+ #4752 - Added null value init for VConn args array
+ #4799 - Doc: parent config has more features than balancer plugin
+ #4716 - Revert "Two more places to check whether attempting half_closed connecton logic is feasible"
This looks suspicious. Because this is the only commit which touch HttpSM and related keep-alive. But we can't simply revert this commit (6ac3b2ad39b9ca0c3fcb421a582978ca2a3c63), because the original change (653927f9aab9a3cbe8f09521dd8c8154ccbc1614) broke keep-alive of HTTP/1.1 over TLS.
yeah, the broken keep-alive is terrible, but on the other head i live with it the whole year and until there is a better fix i prefer to live with it longer given that there are serious issues with HTTP2 where Debian even r3ecommends a dist-upgrade because they don't plan to backport the changes
8.0.4 has since years the first time a beautiful https keep-alive performance but a regulary crashing server is unacceptable and i doubt that i am the only one, most people just don't give a shit reading their logs and it heals itself so nobody notices except the users with broken connections due the internal restart
Can you please print HttpSM::magic and HttpSM::handler
Can you please print HttpSM::magic and HttpSM::handler
who is "you" in that context and hwat does that even mean?
in my case i can't run anything above 8.0.1 for now because i am unable to reporduce the issue on a testmachine and production is affected every few minutes unpredictable
Our current theory is that this continue to be an issue related to code changes related to half closed TCP connections. This previously caused Keep-Alive to fail, and one patch was reverted, but I'm thinking now that one (or more) of the following needs to be looked at (thanks @randall for making this list):
CHANGELOG-8.0.0: #3325 - Adding proxy.config.http.allow_half_open
CHANGELOG-8.0.0: #3714 - Two more places to check if attempting half_closed connection logic is feasible
CHANGELOG-8.0.0: #4213 - Disable the HttpSM half open logic if the underlying transport is TLS
CHANGELOG-8.0.1: #4267 - Set log code when closing half open connections.
CHANGELOG-8.0.2: #4716 - Revert "Two more places to check whether attempting half_closed connecton logic is feasible"
@thelounge-zz If you could try this package, that would really help us understand if we're on the right track with this:
https://ci.trafficserver.apache.org/src/tests/trafficserver-8.0.6-rc0.tar.bz2
Note that this is not a public / official release, this will not be the final version.
sadly the same shit ("CONFIG proxy.config.http.allow_half_open INT 2" set with your tarball) rolling back to 8.0.1 :-( :-(
Oct 28 23:09:29 proxy traffic_manager[949120]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE
Oct 28 23:09:29 proxy traffic_manager[949120]: traffic_server: received signal 6 (Aborted)
Oct 28 23:09:29 proxy traffic_manager[949120]: traffic_server - STACK TRACE:
Oct 28 23:09:29 proxy traffic_server[949127]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa4)[0x558092b558aa]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/libpthread.so.0(+0x12c60)[0x7fb3d08bec60]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/libc.so.6(gsignal+0x145)[0x7fb3d071de35]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/libc.so.6(abort+0x127)[0x7fb3d0708895]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/trafficserver/libtscore.so.8(_Z17ink_mutex_destroyP15pthread_mutex_t+0x0)[0x7fb3d12a114c]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/trafficserver/libtscore.so.8(ink_freelist_init_ops+0x0)[0x7fb3d12a20c4]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x3c)[0x558092b2ce5e]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xd3)[0x558092afa657]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x92)[0x558092a35720]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9LinklinkEEPiS5+0x62)[0x558092a35808]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x167)[0x558092a35a53]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(+0x96446)[0x558092a34446]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/libpthread.so.0(+0x84c0)[0x7fb3d08b44c0]
Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/libc.so.6(clone+0x43)[0x7fb3d07e2553]
Oct 28 23:09:32 proxy traffic_server[949205]: NOTE: --- traffic_server Starting ---
Oct 28 23:09:32 proxy traffic_server[949205]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 8.0.6 - (build # 102822 on Oct 28 2019 at 22:57:56)
Oct 28 23:09:32 proxy traffic_server[949205]: NOTE: RLIMIT_NOFILE(7):cur(675000),max(675000)
The change here doesn't need that setting, but seems like it shouldn't matter if you set it to "2" though. This reproduces this fast on your site? Or do you have a way to trigger it ?
well, we host some hundret websites and it happens pretty fast but i would sell my soul for a clear "send this request and boom" but i don't have a clue what is going on in the usual mixed real world traffic
no way to trigger it oustide production :-(
@thelounge-zz We'll figure it out, the help you provide here is crucial. I'll have another tar-ball to try in a bit, but can you confirm that you have HTTP/2 enabled? I think that this only (or mostly) happens with HTTP/2 connections.
surely and the brotli-compression is done by ATS too
[root@proxy:/etc/trafficserver]$ cat plugin.config compress.so /etc/trafficserver/compress.config header_rewrite.so /etc/trafficserver/header_rewrite.config
HTTP/2.0 200 OK date: Tue, 29 Oct 2019 15:38:46 GMT strict-transport-security: max-age=31536000 expires: Thu, 19 Nov 1981 08:52:00 GMT cache-control: private etag: d0aade35746c42579be2f659f92960ba-df last-modified: Thu, 19 Dec 2013 18:26:50 GMT vary: Accept-Encoding,User-Agent x-content-type-options: nosniff x-response-time: D=5802 us content-type: text/html; charset=ISO-8859-1 content-encoding: br age: 0 X-Firefox-Spdy: h2 Content-Security-Policy-Report-Only: worker-src 'none'; report-uri about:blank
@thelounge-zz can you try https://ci.trafficserver.apache.org/src/tests/trafficserver-8.0.6-rc-zz1.tar.bz2 please ?
@thelounge-zz can you try https://ci.trafficserver.apache.org/src/tests/trafficserver-8.0.6-rc-zz1.tar.bz2 please ?
for 10 minutes up on the production server
"ab -c 250 -n 500000" and "ab -c 250 -n 500000 -k" against a 650 byte small gif image are fine (in 8.0.1 keep-alive is terrible broken which is besides the H2 security issues one reason i want to get rid of it better sooner than later) and until now no crash - knocking wood - it's 9 PM here and load is very low
@thelounge-zz can you try https://ci.trafficserver.apache.org/src/tests/trafficserver-8.0.6-rc-zz1.tar.bz2 please ?
i should habe kept my mouth shut :-(
Oct 29 21:12:39 proxy traffic_manager[1072131]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE
Oct 29 21:12:39 proxy traffic_manager[1072131]: traffic_server: received signal 6 (Aborted)
Oct 29 21:12:39 proxy traffic_manager[1072131]: traffic_server - STACK TRACE:
Oct 29 21:12:39 proxy traffic_server[1072137]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa4)[0x563a4e753902]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/libpthread.so.0(+0x12c60)[0x7f05cb91cc60]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/libc.so.6(gsignal+0x145)[0x7f05cb77be35]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/libc.so.6(abort+0x127)[0x7f05cb766895]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/trafficserver/libtscore.so.8(_Z17ink_mutex_destroyP15pthread_mutex_t+0x0)[0x7f05cc2ff14c]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/trafficserver/libtscore.so.8(ink_freelist_init_ops+0x0)[0x7f05cc3000c4]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x3c)[0x563a4e72ae86]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xd3)[0x563a4e6f866f]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x92)[0x563a4e633720]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9LinklinkEEPiS5+0x62)[0x563a4e633808]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x167)[0x563a4e633a53]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(+0x96446)[0x563a4e632446]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/libpthread.so.0(+0x84c0)[0x7f05cb9124c0]
Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/libc.so.6(clone+0x43)[0x7f05cb840553]
Oct 29 21:12:42 proxy traffic_server[1072623]: NOTE: --- traffic_server Starting ---
Oct 29 21:12:42 proxy traffic_server[1072623]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 8.0.6 - (build # 102920 on Oct 29 2019 at 20:57:03)
Oct 29 21:12:42 proxy traffic_server[1072623]: NOTE: RLIMIT_NOFILE(7):cur(675000),max(675000)
aaaaargh...
what i see often in the logs with 8.0.1 are the following error messages - can this be related somehow and even if not - is i a good idea to flood logs that way - if a would be an attacker i would use mixed load (many connections without any data and close them, many connections per second, expensive connections, anything which can flood logs and so fill the disk and/or produce load)
[Oct 31 12:59:41.841] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177112 stream_id=43 recv headers compression error [Oct 31 12:59:41.885] {0x7fed381b6700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177174 stream_id=1 recv headers compression error [Oct 31 12:59:41.900] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177175 stream_id=1 recv headers compression error [Oct 31 12:59:41.948] {0x7fed384bc700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177176 stream_id=1 recv headers compression error [Oct 31 12:59:41.963] {0x7fed383ba700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177177 stream_id=1 recv headers compression error [Oct 31 12:59:42.033] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177178 stream_id=1 recv headers compression error [Oct 31 12:59:42.050] {0x7fed381b6700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177179 stream_id=1 recv headers compression error [Oct 31 12:59:52.258] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177182 stream_id=3 recv headers compression error [Oct 31 12:59:52.301] {0x7fed383ba700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177188 stream_id=1 recv headers compression error [Oct 31 12:59:52.331] {0x7fed384bc700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177189 stream_id=1 recv headers compression error [Oct 31 12:59:52.346] {0x7fed383ba700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177190 stream_id=1 recv headers compression error [Oct 31 12:59:52.388] {0x7fed383ba700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177191 stream_id=1 recv headers compression error [Oct 31 12:59:52.419] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177192 stream_id=1 recv headers compression error [Oct 31 12:59:52.434] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177193 stream_id=1 recv headers compression error [Oct 31 13:03:17.988] {0x7fed381b6700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177273 stream_id=15 recv headers compression error [Oct 31 13:03:18.035] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177385 stream_id=1 recv headers compression error [Oct 31 13:03:18.054] {0x7fed381b6700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177386 stream_id=1 recv headers compression error [Oct 31 13:03:18.102] {0x7fed383ba700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177387 stream_id=1 recv headers compression error [Oct 31 13:03:18.119] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177388 stream_id=1 recv headers compression error [Oct 31 13:03:18.167] {0x7fed381b6700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177389 stream_id=1 recv headers compression error
Another thought I had is that maybe the half-opened was a red herring, and this issue has been in 8.0.x all along (introduced somewhere between 7.1.x and the v8.0.0 release). My thinking is that with v8.0.0, we broke Keep-Alive, so maybe in that version, this can't trigger because the connections are closed so fast. But once fixed, the error occurs again.
Sudheer suggested a test, where if you turn off KA, this assertion would not happen (or at least not nearly as often). @thelounge-zz would you be able to test that? It's obviously not a solution, but helps us point in a new direction to look.
CONFIG proxy.config.http.keep_alive_enabled_in INT 0
I don't think proxy.config.http.keep_alive_enabled_out is the issue, but could possibly test with that turned off as well.
hmm - but wouldn't it than also be triggered with "ab -c 150 -n 250000 -k" which is broken in 8.0.1 (meaning it fails without errors on the server side as far as i know) and works with later versions?
Hmmm, I've never been able to reproduce with a simple "ab" in my testing :-/. With or without "-k".
well, that's why i wonder how it can be keep-alive, however i iwill gibe 8.0.5 with disabled keep alive a try but not now, 5:00 AM here and busy with coding stuff behind the proxy :-)
indeed, no crash for 7 hours, well - given that we now have https eveyrwhere non-keep alive is really expensive when each and every image, javascript and so on triggers a new handshake :-(
maybe that fu***ers in case of enabled keep alive would be the trigger?
[Nov 19 06:36:17.956] {0x7fcd11573700} ERROR: HTTP/2 stream error client_ip=119.94.111.110 session_id=13791 stream_id=37 refused to create new stream, because ua_session is in half_close state [Nov 19 06:36:17.957] {0x7fcd11573700} ERROR: HTTP/2 connection error client_ip=119.94.111.110 session_id=13791 stream_id=37 window update stream invalid id [Nov 19 06:41:59.381] {0x7fcd117b7700} ERROR: HTTP/2 session error client_ip=130.105.99.84 session_id=14214 closing a connection, because its stream error rate (1.000000) is too high [Nov 19 09:13:58.493] {0x7fcd117b7700} ERROR: HTTP/2 stream error client_ip=89.144.211.115 session_id=33516 stream_id=19 refused to create new stream, because ua_session is in half_close state [Nov 19 09:13:58.493] {0x7fcd117b7700} ERROR: HTTP/2 stream error client_ip=89.144.211.115 session_id=33516 stream_id=21 refused to create new stream, because ua_session is in half_close state [Nov 19 09:13:58.493] {0x7fcd117b7700} ERROR: HTTP/2 session error client_ip=89.144.211.115 session_id=33516 closing a connection, because its stream error rate (0.222222) is too high [Nov 19 09:56:53.512] {0x7fcd11573700} ERROR: HTTP/2 stream error client_ip=86.8.200.129 session_id=39448 stream_id=5 refused to create new stream, because ua_session is in half_close state [Nov 19 09:56:53.512] {0x7fcd11573700} ERROR: HTTP/2 session error client_ip=86.8.200.129 session_id=39448 closing a connection, because its stream error rate (0.500000) is too high [Nov 19 10:31:05.506] {0x7fcd118b9700} ERROR: HTTP/2 stream error client_ip=123.194.156.154 session_id=45129 stream_id=3 refused to create new stream, because ua_session is in half_close state [Nov 19 10:31:05.506] {0x7fcd118b9700} ERROR: HTTP/2 session error client_ip=123.194.156.154 session_id=45129 closing a connection, because its stream error rate (1.000000) is too high [Nov 19 11:02:46.782] {0x7fcd119bb700} ERROR: HTTP/2 stream error client_ip=80.110.104.188 session_id=50106 stream_id=17 refused to create new stream, because ua_session is in half_close state [Nov 19 11:02:46.782] {0x7fcd119bb700} ERROR: HTTP/2 session error client_ip=80.110.104.188 session_id=50106 closing a connection, because its stream error rate (1.000000) is too high
@thelounge-zz Hey, don't know if you are still with us, or if you gave up, but any chance you can try this patch? @masaori335 found that we missed this commit in the 8.0.1 release, I don't know that it will change anything, but at this point, it's worth a shot.
commit 4cdb42b4b2cf175c3eb0ff4f567a3394ce37ff66
Author: Susan Hinrichs <shinrich@oath.com>
AuthorDate: Thu Aug 2 13:44:54 2018 +0000
Commit: Leif Hedstrom <zwoop@apache.org>
CommitDate: Mon Mar 30 09:05:36 2020 -0600
Remove the ignore_keep_alive method entirely
(cherry picked from commit da9edd5feba06da6ebbe47eaf9696c37c5483fa1)
diff --git a/proxy/ProxyClientTransaction.h b/proxy/ProxyClientTransaction.h
index 12fcf30ca..86f7b6528 100644
--- a/proxy/ProxyClientTransaction.h
+++ b/proxy/ProxyClientTransaction.h
@@ -196,12 +196,6 @@ public:
{
}
- virtual bool
- ignore_keep_alive()
- {
- return true;
- }
-
virtual void destroy();
virtual void transaction_done() = 0;
diff --git a/proxy/http/Http1ClientTransaction.h b/proxy/http/Http1ClientTransaction.h
index 0251b2c1b..8a876660f 100644
--- a/proxy/http/Http1ClientTransaction.h
+++ b/proxy/http/Http1ClientTransaction.h
@@ -81,12 +81,6 @@ public:
void release(IOBufferReader *r) override;
- bool
- ignore_keep_alive() override
- {
- return false;
- }
-
bool allow_half_open() const override;
void set_parent(ProxyClientSession *new_parent) override;
diff --git a/proxy/http/HttpTransact.cc b/proxy/http/HttpTransact.cc
index 562952179..0a7d71010 100644
--- a/proxy/http/HttpTransact.cc
+++ b/proxy/http/HttpTransact.cc
@@ -5451,7 +5451,7 @@ HttpTransact::initialize_state_variables_from_request(State *s, HTTPHdr *obsolet
}
// If this is an internal request, never keep alive
- if (!s->txn_conf->keep_alive_enabled_in || (s->state_machine->ua_txn && s->state_machine->ua_txn->ignore_keep_alive())) {
+ if (!s->txn_conf->keep_alive_enabled_in) {
s->client_info.keep_alive = HTTP_NO_KEEPALIVE;
} else if (vc && vc->get_is_internal_request()) {
// Following the trail of JIRAs back from TS-4960, there can be issues with
diff --git a/proxy/http2/Http2Stream.h b/proxy/http2/Http2Stream.h
index 3be1057a5..68099d246 100644
--- a/proxy/http2/Http2Stream.h
+++ b/proxy/http2/Http2Stream.h
@@ -142,13 +142,6 @@ public:
void signal_write_event(bool call_update);
void reenable(VIO *vio) override;
void transaction_done() override;
- bool
- ignore_keep_alive() override
- {
- // If we return true here, Connection header will always be "close".
- // It should be handled as the same as HTTP/1.1
- return false;
- }
void restart_sending();
void push_promise(URL &url, const MIMEField *accept_encoding);
sorry, for now i stay with KeepAlive disabled, a ton of other work
BTW: https://trafficserver.apache.org/downloads Apache Traffic Server v8.0.6 was released on August 20th, 2019
this is nonsense, 8.0.5 was released at that time
when you missed a commit i don't get why it's not part of https://raw.githubusercontent.com/apache/trafficserver/8.0.x/CHANGELOG-8.0.7 more than a year later
@masaori335 @zwoop
Got this core in our production with 8.0.8 (gdb) bt
%s
") at ink_error.cc:99line=line@entry=2516) at ink_assert.cc:37
at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160
ev_count=ev_count@entry=0x2b6d7892ee6c, nq_count=nq_count@entry=0x2b6d7892ee68) at UnixEThread.cc:170
(gdb)
Was able to verify 5934. Back ported this to 8.0.8 and verified.
Let me close this issue as fixed. Because @Vijay-1's backtrace seems exactly the same as the original report and Bryan's one. He made sure #5934 fixed this issue. It's backported to 8.1.x and 9.0.x. ( released by 8.1.0 and 9.0.0 ).
Please reopen this issue if anybody still has the crash with 8.1.0 and later.
Note: it's possible the same crash could happen if there is another scheduled but orphaned VC_EVENT_WRITE_COMPLETE (103)
event.
@thelounge-zz please give a try 8.1.0, 8.1.1, or 9.0.0.
seems to be fixed in 8.1.x
hell, 8.0.1 broke https keep-alive and 8.0.2 has random segfaults a few times per hour
Feb 4 13:37:00 proxy traffic_server[752595]: Fatal: HttpSM.cc:2516: failed assertion
magic == HTTP_SM_MAGIC_ALIVE