apache / trafficserver

Apache Traffic Server™ is a fast, scalable and extensible HTTP/1.1 and HTTP/2 compliant caching proxy server.
https://trafficserver.apache.org/
Apache License 2.0
1.8k stars 797 forks source link

8.0.2/8.0.3: traffic_manager[631826]: Fatal: HttpSM.cc:2516: failed assertion `magic == HTTP_SM_MAGIC_ALIVE` #4921

Closed thelounge-zz closed 3 years ago

thelounge-zz commented 5 years ago

hell, 8.0.1 broke https keep-alive and 8.0.2 has random segfaults a few times per hour

Feb 4 13:37:00 proxy traffic_server[752595]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE

Feb  4 13:37:05 proxy traffic_server[752634]: NOTE: --- traffic_server Starting ---
Feb  4 13:37:05 proxy traffic_server[752634]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 8.0.2 - (build # 020320 on Feb  3 2019 at 20:33:49)
Feb  4 13:37:05 proxy traffic_server[752634]: NOTE: RLIMIT_NOFILE(7):cur(675000),max(675000)
Feb  4 13:38:54 proxy traffic_manager[631826]: Fatal: HttpSM.cc:2516: failed assertion `magic == HTTP_SM_MAGIC_ALIVE`
Feb  4 13:38:54 proxy traffic_manager[631826]: traffic_server: received signal 6 (Aborted)
Feb  4 13:38:54 proxy traffic_manager[631826]: traffic_server - STACK TRACE:
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa0)[0x555d1ebfa3a3]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/lib64/libpthread.so.0(+0x12080)[0x7f9a06c75080]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/lib64/libc.so.6(gsignal+0x10b)[0x7f9a068daeab]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/lib64/libc.so.6(abort+0x123)[0x7f9a068c55b9]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/lib64/trafficserver/libtscore.so.8(_Z17ink_mutex_destroyP15pthread_mutex_t+0x0)[0x7f9a091a7e7c]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/lib64/trafficserver/libtscore.so.8(ink_freelist_init_ops+0x0)[0x7f9a091a8df6]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x3b)[0x555d1ebd46d1]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xd6)[0x555d1eba5b54]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x555d1eae1d9b]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9Link_linkEEPiS5_+0x62)[0x555d1eae1e80]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x161)[0x555d1eae20cd]
Feb  4 13:38:54 proxy traffic_server[752634]: Fatal: HttpSM.cc:2516: failed assertion `magic == HTTP_SM_MAGIC_ALIVE`
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/bin/traffic_server(+0x94b56)[0x555d1eae0b56]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/lib64/libpthread.so.0(+0x7594)[0x7f9a06c6a594]
Feb  4 13:38:54 proxy traffic_manager[631826]: /usr/lib64/libc.so.6(clone+0x3f)[0x7f9a0699df4f]
bryancall commented 5 years ago

@thelounge-zz Did you see this issue 8.0.1? How much did you run 8.0.1?

bryancall commented 5 years ago

I am seeing the same issue with 8.0.3:

(gdb) bt full
#0  0x00002aaaadd121d7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00002aaaadd138c8 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00002aaaaad29a3b in ink_abort (message_format=message_format@entry=0x2aaaaad906a7 "%s:%d: failed assertion `%s`") at ink_error.cc:99
        ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x2aaab39fe570, reg_save_area = 0x2aaab39fe4b0}}
#3  0x00002aaaaad27525 in _ink_assert (expression=expression@entry=0x735706 "magic == HTTP_SM_MAGIC_ALIVE", file=file@entry=0x7350d6 "HttpSM.cc", line=line@entry=2516) at ink_assert.cc:37
No locals.
#4  0x000000000050f767 in HttpSM::main_handler (this=<optimized out>, event=<optimized out>, data=<optimized out>) at HttpSM.cc:2516
        __FUNCTION__ = "main_handler"
#5  0x0000000000557969 in handleEvent (data=0x2aabe1f4c4f8, event=2301, this=<optimized out>) at /home/bcall/dev/yahoo/build/_build/build_debug_posix-x86_64_gcc_8/trafficserver8/iocore/eventsystem/I_Continuation.h:160
No locals.
#6  HttpTunnel::main_handler (this=0x2aabe1f4c4f8, event=<optimized out>, data=<optimized out>) at HttpTunnel.cc:1643
        p = <optimized out>
        c = <optimized out>
        sm_callback = <optimized out>
#7  0x000000000071a4b2 in handleEvent (data=0x2aac1dcbd620, event=103, this=<optimized out>) at I_Continuation.h:160
thelounge-zz commented 5 years ago

Did you see this issue 8.0.1?

no

How much did you run 8.0.1?

i run it all time since the "ab -c 50 -n 100000 -k" https issue don#t seem to be that relevant in real life but it's anyways bad and had to downgrade from 8.0.2 to 8.0.1 because this issue is not acceptable in production

scw00 commented 5 years ago

Can you please print HttpSM::history ?

bryancall commented 5 years ago

@scw00

(gdb) frame 6
#6  HttpTunnel::main_handler (this=0x2aac4307cf58, event=<optimized out>, data=<optimized out>) at HttpTunnel.cc:1643
1643          sm->handleEvent(HTTP_TUNNEL_EVENT_DONE, this);
(gdb) p sm->history
$6 = {history = {{location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282},
      event = 60000, reentrancy = 2}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout",
        line = 1322}, event = 60000, reentrancy = 2}, {location = {file = 0x7350d6 "HttpSM.cc",
        func = 0x738f40 <HttpSM::state_read_client_request_header(int, void*)::__FUNCTION__> "state_read_client_request_header", line = 578}, event = 100, reentrancy = 3}, {
      location = {file = 0x7350d6 "HttpSM.cc", func = 0x738f40 <HttpSM::state_read_client_request_header(int, void*)::__FUNCTION__> "state_read_client_request_header",
        line = 578}, event = 100, reentrancy = 1}, {location = {file = 0x7350d6 "HttpSM.cc",
        func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 2}, {location = {
        file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 2}, {
      location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000,
      reentrancy = 3}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322},
      event = 60000, reentrancy = 3}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback",
        line = 1282}, event = 60000, reentrancy = 4}, {location = {file = 0x7350d6 "HttpSM.cc",
        func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 4}, {location = {
        file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 5}, {
      location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000,
      reentrancy = 5}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282},
      event = 60000, reentrancy = 6}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout",
        line = 1322}, event = 60000, reentrancy = 6}, {location = {file = 0x7350d6 "HttpSM.cc",
        func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 7}, {location = {
        file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 7}, {
      location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000,
      reentrancy = 8}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322},
      event = 60000, reentrancy = 8}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback",
        line = 1282}, event = 60000, reentrancy = 9}, {location = {file = 0x7350d6 "HttpSM.cc",
        func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 9}, {location = {
        file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 10}, {
      location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000,
      reentrancy = 10}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282},
      event = 60000, reentrancy = 11}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout",
        line = 1322}, event = 60000, reentrancy = 11}, {location = {file = 0x7350d6 "HttpSM.cc",
        func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 12}, {location = {
        file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 12}, {
      location = {file = 0x7350d6 "HttpSM.cc", func = 0x738358 <HttpSM::set_next_state()::__FUNCTION__> "set_next_state", line = 7309}, event = 34463, reentrancy = 12}, {
      location = {file = 0x743cfd "HttpCacheSM.cc", func = 0x743e90 <HttpCacheSM::state_cache_open_read(int, void*)::__FUNCTION__> "state_cache_open_read", line = 116},
      event = 1102, reentrancy = -31073}, {location = {file = 0x7350d6 "HttpSM.cc",
        func = 0x738be0 <HttpSM::state_cache_open_read(int, void*)::__FUNCTION__> "state_cache_open_read", line = 2455}, event = 1102, reentrancy = 13}, {location = {
        file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 14}, {
      location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000,
      reentrancy = 14}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282},
      event = 60000, reentrancy = 15}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout",
        line = 1322}, event = 60000, reentrancy = 15}, {location = {file = 0x7350d6 "HttpSM.cc",
        func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 16}, {location = {
        file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 16}, {
      location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000,
      reentrancy = 17}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322},
      event = 60000, reentrancy = 17}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback",
        line = 1282}, event = 60000, reentrancy = 18}, {location = {file = 0x7350d6 "HttpSM.cc",
        func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 18}, {location = {
        file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 19}, {
      location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000,
      reentrancy = 19}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x7385d0 <HttpSM::setup_cache_read_transfer()::__FUNCTION__> "setup_cache_read_transfer", line = 6065},
      event = 34463, reentrancy = 19}, {location = {file = 0x7350d6 "HttpSM.cc",
        func = 0x738a60 <HttpSM::tunnel_handler_cache_read(int, HttpTunnelProducer*)::__FUNCTION__> "tunnel_handler_cache_read", line = 3299}, event = 102, reentrancy = 12}, {
      location = {file = 0x7350d6 "HttpSM.cc", func = 0x738aa0 <HttpSM::tunnel_handler_ua(int, HttpTunnelConsumer*)::__FUNCTION__> "tunnel_handler_ua", line = 3125}, event = 103,
      reentrancy = 12}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738b40 <HttpSM::tunnel_handler(int, void*)::__FUNCTION__> "tunnel_handler", line = 2810}, event = 2301,
      reentrancy = 13}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x7387f0 <HttpSM::do_cache_lookup_and_read()::__FUNCTION__> "do_cache_lookup_and_read", line = 4507},
      event = 0, reentrancy = 12}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback",
        line = 1282}, event = 60000, reentrancy = 2}, {location = {file = 0x7350d6 "HttpSM.cc",
        func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000, reentrancy = 2}, {location = {
        file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282}, event = 60000, reentrancy = 3}, {
      location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1322}, event = 60000,
      reentrancy = 3}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738dd0 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1282},
      event = 60000, reentrancy = 4}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x738db0 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout",
        line = 1322}, event = 60000, reentrancy = 4}, {location = {file = 0x7350d6 "HttpSM.cc", func = 0x7383e8 <HttpSM::kill_this()::__FUNCTION__> "kill_this", line = 6856},
      event = 34463, reentrancy = 0}, {location = {file = 0x0, func = 0x0, line = 0}, event = 0, reentrancy = 0} <repeats 12 times>}, history_pos = 53}
oknet commented 5 years ago

@thelounge-zz

Could you please backport PR #4379 and test again ?

oknet commented 5 years ago

There is an assert within HttpTunnel::main_handler:

int
HttpTunnel::main_handler(int event, void *data)
{
  HttpTunnelProducer *p = nullptr;
  HttpTunnelConsumer *c = nullptr;
  bool sm_callback      = false;

  ++reentrancy_count;

  ink_assert(sm->magic == HTTP_SM_MAGIC_ALIVE);

and HttpSM::main_handler have its own assert:

int
HttpSM::main_handler(int event, void *data)
{
  ink_release_assert(magic == HTTP_SM_MAGIC_ALIVE);

According to the backtrace:

(gdb) bt full
#0 0x00002aaaadd121d7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00002aaaadd138c8 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00002aaaaad29a3b in ink_abort (message_format=message_format@entry=0x2aaaaad906a7 "%s:%d: failed assertion `%s`") at ink_error.cc:99
ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x2aaab39fe570, reg_save_area = 0x2aaab39fe4b0}}
#3 0x00002aaaaad27525 in _ink_assert (expression=expression@entry=0x735706 "magic == HTTP_SM_MAGIC_ALIVE", file=file@entry=0x7350d6 "HttpSM.cc", line=line@entry=2516) at ink_assert.cc:37
No locals.
#4 0x000000000050f767 in HttpSM::main_handler (this=, event=, data=) at HttpSM.cc:2516
**FUNCTION** = "main_handler"
#5 0x0000000000557969 in handleEvent (data=0x2aabe1f4c4f8, event=2301, this=) at /home/bcall/dev/yahoo/build/_build/build_debug_posix-x86_64_gcc_8/trafficserver8/iocore/eventsystem/I_Continuation.h:160
No locals.
#6 HttpTunnel::main_handler (this=0x2aabe1f4c4f8, event=, data=) at HttpTunnel.cc:1643
p = 
c = 
sm_callback = 
#7 0x000000000071a4b2 in handleEvent (data=0x2aac1dcbd620, event=103, this=) at I_Continuation.h:160

The ATS should assert at frame 6th if debug enabled.

@bryancall Do you have debug enabled ?

thelounge-zz commented 5 years ago

@oknet no i can't backport anything, i work with tarballs and rpmbuild and have no business in C/C++ at all

bryancall commented 5 years ago

@oknet I am not running a debug build or else it would have asserted in the tunnel. It looks like the HttpSM has been killed and there are still some events happening in the tunnel.

bryancall commented 5 years ago

@oknet I will try a test with #4379 and see what happens. This bug is very reproducible with me and happens within the first 1000 requests of the server starting.

randall commented 5 years ago

Updated formatting on logs/backtraces for legibility.

And for the record, we've now seen this crash a few times.

thelounge-zz commented 5 years ago

is this now fixed or not? https://raw.githubusercontent.com/apache/trafficserver/8.0.x/CHANGELOG-8.0.3

frankly it's not funny go ahead and find it out

thelounge-zz commented 5 years ago

it's not

Mar 23 16:13:47 proxy traffic_manager[371818]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE Mar 23 16:13:47 proxy traffic_manager[371818]: traffic_server: received signal 6 (Aborted) Mar 23 16:13:47 proxy traffic_manager[371818]: traffic_server - STACK TRACE: Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa0)[0x55b9a165c9f3] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/libpthread.so.0(+0x12080)[0x7fde995e2080] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(gsignal+0x10b)[0x7fde99247eab] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(abort+0x123)[0x7fde992325b9] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/trafficserver/libtscore.so.8(_Z17ink_mutex_destroyP15pthread_mutex_t+0x0)[0x7fde9bb13ee6] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/trafficserver/libtscore.so.8(ink_freelist_init_ops+0x0)[0x7fde9bb14e60] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x3b)[0x55b9a1636c91] Mar 23 16:13:47 proxy traffic_server[371826]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xd6)[0x55b9a1608050] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x55b9a15441e5] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9LinklinkEEPiS5+0x62)[0x55b9a15442ca] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x161)[0x55b9a1544517] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/bin/traffic_server(+0x94fa0)[0x55b9a1542fa0] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/libpthread.so.0(+0x7594)[0x7fde995d7594] Mar 23 16:13:47 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(clone+0x3f)[0x7fde9930af4f] Mar 23 16:13:50 proxy traffic_server[372230]: NOTE: --- traffic_server Starting --- Mar 23 16:13:50 proxy traffic_server[372230]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 8.0.3 - (build # 032315 on Mar 23 2019 at 15:46:18) Mar 23 16:13:50 proxy traffic_server[372230]: NOTE: RLIMIT_NOFILE(7):cur(675000),max(675000) Mar 23 16:41:23 proxy traffic_manager[371818]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE Mar 23 16:41:23 proxy traffic_manager[371818]: traffic_server: received signal 6 (Aborted) Mar 23 16:41:23 proxy traffic_manager[371818]: traffic_server - STACK TRACE: Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa0)[0x5587dd4049f3] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/libpthread.so.0(+0x12080)[0x7fae28cf1080] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(gsignal+0x10b)[0x7fae28956eab] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(abort+0x123)[0x7fae289415b9] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/trafficserver/libtscore.so.8(_Z17ink_mutex_destroyP15pthread_mutex_t+0x0)[0x7fae2b222ee6] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/trafficserver/libtscore.so.8(ink_freelist_init_ops+0x0)[0x7fae2b223e60] Mar 23 16:41:23 proxy traffic_server[372230]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x3b)[0x5587dd3dec91] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xd6)[0x5587dd3b0050] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x5587dd2ec1e5] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9LinklinkEEPiS5+0x62)[0x5587dd2ec2ca] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x161)[0x5587dd2ec517] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/bin/traffic_server(+0x94fa0)[0x5587dd2eafa0] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/libpthread.so.0(+0x7594)[0x7fae28ce6594] Mar 23 16:41:23 proxy traffic_manager[371818]: /usr/lib64/libc.so.6(clone+0x3f)[0x7fae28a19f4f] Mar 23 16:41:26 proxy traffic_server[372449]: NOTE: --- traffic_server Starting --- Mar 23 16:41:26 proxy traffic_server[372449]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 8.0.3 - (build # 032315 on Mar 23 2019 at 15:46:18) Mar 23 16:41:26 proxy traffic_server[372449]: NOTE: RLIMIT_NOFILE(7):cur(675000),max(675000) Mar 23 16:45:05 proxy traffic_manager[371818]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE

vmamidi commented 5 years ago

@thelounge-zz is this reproducible with ab?

thelounge-zz commented 5 years ago

nope, happens randomly but way too often for a production system and i am not able to trigger it as i want for a clear "here look" reproducer

vmamidi commented 5 years ago

@thelounge-zz do you have h2 enabled?

thelounge-zz commented 5 years ago

surely, it's default all the time as well as we use dualstack certificates for every domain (aka RSA + EC certificates to support older clients as well as efficient encryption for modern ones)

thelounge-zz commented 5 years ago

https://github.com/apache/trafficserver/blob/8.0.x/CHANGELOG-8.0.4 still don't mention that issue which is not funny given how long 8.0.1 without that issue is out there and how many bugs are fixed since then

thelounge-zz commented 5 years ago

and as expected the same shit with 8.0.4 after less the 15 minutes uptime, downgrading again to 8.0.1 with it's horrible tls-keep-alive bugs when use "ab -k" but at least not crashing all day long

Aug 14 14:02:12 proxy traffic_manager[494641]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE Aug 14 14:02:12 proxy traffic_manager[494641]: traffic_server: received signal 6 (Aborted) Aug 14 14:02:12 proxy traffic_manager[494641]: traffic_server - STACK TRACE: Aug 14 14:02:12 proxy traffic_server[494647]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa0)[0x56163969fa2c] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/libpthread.so.0(+0x13070)[0x7f4fcff36070] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/libc.so.6(gsignal+0x10f)[0x7f4fcfd9557f] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/libc.so.6(abort+0x127)[0x7f4fcfd7f895] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/trafficserver/libtscore.so.8(_Z17ink_mutex_destroyP15pthread_mutex_t+0x0)[0x7f4fd0aeff6a] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/trafficserver/libtscore.so.8(ink_freelist_init_ops+0x0)[0x7f4fd0af0ee4] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x3b)[0x561639676f81] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xd6)[0x5616396448c0] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x56163957fcd5] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9LinklinkEEPiS5+0x62)[0x56163957fdba] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x161)[0x561639580007] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/bin/traffic_server(+0x95a90)[0x56163957ea90] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/libpthread.so.0(+0x858e)[0x7f4fcff2b58e] Aug 14 14:02:12 proxy traffic_manager[494641]: /usr/lib64/libc.so.6(clone+0x43)[0x7f4fcfe5a713] Aug 14 14:02:15 proxy traffic_server[494918]: NOTE: --- traffic_server Starting --- Aug 14 14:02:15 proxy traffic_server[494918]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 8.0.4 - (build # 081413 on Aug 14 2019 at 13:46:42) Aug 14 14:02:15 proxy traffic_server[494918]: NOTE: RLIMIT_NOFILE(7):cur(675000),max(675000)

zwoop commented 5 years ago

The problem is that no one else is seeing this crasher, at least not that I know of. Does you setup include any plugins? Any other information how to reproduce this, other than "ab" ?

thelounge-zz commented 5 years ago

it's not reproduceable with ab, it crashs in production with a very low load

what i don't understand it that it even show the file and line where it happens (HttpSM.cc:2516) and given that it don't crash with 8.0.1 the offending changes between 8.0.1 and 8.0.2 shouldn't be that much in this area

[root@proxy:/etc/trafficserver]$ cat plugin.config compress.so /etc/trafficserver/compress.config header_rewrite.so /etc/trafficserver/header_rewrite.config

[root@proxy:/etc/trafficserver]$ cat /etc/trafficserver/compress.config enabled true cache true flush true remove-accept-encoding true supported-algorithms br,gzip,deflate compressible-content-type text/* compressible-content-type application/javascript compressible-content-type application/json compressible-content-type application/msaccess compressible-content-type application/msexcel compressible-content-type application/mshelp compressible-content-type application/mspowerpoint compressible-content-type application/msword compressible-content-type application/pdf compressible-content-type application/postscript compressible-content-type application/rss+xml compressible-content-type application/vnd.ms-excel.addin.macroEnabled.12 compressible-content-type application/vnd.ms-excel.sheet.binary.macroEnabled.12 compressible-content-type application/vnd.ms-excel.sheet.macroEnabled.12 compressible-content-type application/vnd.ms-excel.template.macroEnabled.12 compressible-content-type application/vnd.ms-fontobject compressible-content-type application/vnd.ms-powerpoint.addin.macroEnabled.12 compressible-content-type application/vnd.ms-powerpoint.presentation.macroEnabled.12 compressible-content-type application/vnd.ms-powerpoint.slideshow.macroEnabled.12 compressible-content-type application/vnd.ms-powerpoint.template.macroEnabled.12 compressible-content-type application/vnd.ms-word.document.macroEnabled.12 compressible-content-type application/vnd.ms-word.template.macroEnabled.12 compressible-content-type application/vnd.wap.wmlc compressible-content-type application/x-httpd-php-source compressible-content-type application/x-javascript compressible-content-type application/x-latex compressible-content-type application/x-sh compressible-content-type application/xhtml+xml compressible-content-type application/xml compressible-content-type application/xml+rss compressible-content-type audio/x-wav compressible-content-type font/otf compressible-content-type font/ttf compressible-content-type image/svg+xml

[root@proxy:/etc/trafficserver]$ cat /etc/trafficserver/header_rewrite.config cond %{SEND_REQUEST_HDR_HOOK} cond %{INCOMING-PORT}=443 set-header X-TL-TLS-Offload 1 set-header X-TL-TLS-Offload-Port %{INCOMING-PORT} cond %{SEND_REQUEST_HDR_HOOK} cond %{INCOMING-PORT}=8443 set-header X-TL-TLS-Offload 1 set-header X-TL-TLS-Offload-Port %{INCOMING-PORT}

randall commented 5 years ago

@zwoop We saw it a few times, but it's very very rare. Also, it looked like @bryancall saw it as well.

The changes between 8.0.1 and 8.0.2 are relatively minimal:

git diff --stat 8.0.1 8.0.2

CHANGELOG-8.0.2                                    |  12 ++
 STATUS                                             |   8 +-
 configure.ac                                       |   4 +-
 doc/Makefile.am                                    |   2 +-
 doc/admin-guide/introduction.en.rst                | 154 ++++++++++-----------
 doc/admin-guide/plugins/balancer.en.rst            |   4 +
 doc/admin-guide/plugins/sslheaders.en.rst          |   2 +-
 doc/ext/local-config.py.in                         |  13 +-
 doc/{ => ext}/plantuml_fetch.sh                    |   0
 doc/ext/traffic-server.py                          |  12 +-
 doc/uml/Makefile.am                                |   2 +-
 iocore/eventsystem/I_VConnection.h                 |   2 +-
 iocore/eventsystem/P_UnixEventProcessor.h          |   7 +-
 plugins/experimental/sslheaders/expand.cc          |   2 +-
 plugins/experimental/sslheaders/sslheaders.cc      |  82 ++++++++---
 plugins/experimental/sslheaders/sslheaders.h       |   8 +-
 proxy/ProxyClientSession.cc                        |   3 +-
 proxy/http/HttpSM.cc                               |   7 +-
 tests/gold_tests/pluginTest/sslheaders/observer.py |  31 +++++
 .../pluginTest/sslheaders/ssl/server.key           |  28 ++++
 .../pluginTest/sslheaders/ssl/server.pem           |  21 +++
 .../pluginTest/sslheaders/sslheaders.gold          |   1 +
 .../pluginTest/sslheaders/sslheaders.test.py       |  91 ++++++++++++
 tools/package/trafficserver.spec                   |   2 +-
 24 files changed, 376 insertions(+), 122 deletions(-)
+Changes with Apache Traffic Server 8.0.2
+  #3601 - Add TLSv1.3 cipher suites for OpenSSL-1.1.1
+  #3927 - cppapi :InterceptPlugin fix
+  #3939 - Make sure the index stays positive
+  #4217 - Fix a regression in the traffic_ctl host status subcommand. 
+  #4369 - Doc: Fix doc build to work with Sphinx 1.8.
+  #4541 - Doc minor fixes.
+  #4700 - sslheaders experimental plugin:  fix doc typo, improve container use.
+  #4701 - Make sslheaders plugin better conform to documentation.
+  #4716 - Revert "Two more places to check whether attempting half_closed connecton logic is feasible"
+  #4752 - Added null value init for VConn args array
+  #4799 - Doc: parent config has more features than balancer plugin
masaori335 commented 5 years ago
+  #4716 - Revert "Two more places to check whether attempting half_closed connecton logic is feasible"

This looks suspicious. Because this is the only commit which touch HttpSM and related keep-alive. But we can't simply revert this commit (6ac3b2ad39b9ca0c3fcb421a582978ca2a3c63), because the original change (653927f9aab9a3cbe8f09521dd8c8154ccbc1614) broke keep-alive of HTTP/1.1 over TLS.

thelounge-zz commented 5 years ago

yeah, the broken keep-alive is terrible, but on the other head i live with it the whole year and until there is a better fix i prefer to live with it longer given that there are serious issues with HTTP2 where Debian even r3ecommends a dist-upgrade because they don't plan to backport the changes

8.0.4 has since years the first time a beautiful https keep-alive performance but a regulary crashing server is unacceptable and i doubt that i am the only one, most people just don't give a shit reading their logs and it heals itself so nobody notices except the users with broken connections due the internal restart

scw00 commented 5 years ago

Can you please print HttpSM::magic and HttpSM::handler

thelounge-zz commented 5 years ago

Can you please print HttpSM::magic and HttpSM::handler

who is "you" in that context and hwat does that even mean?

in my case i can't run anything above 8.0.1 for now because i am unable to reporduce the issue on a testmachine and production is affected every few minutes unpredictable

zwoop commented 5 years ago

Our current theory is that this continue to be an issue related to code changes related to half closed TCP connections. This previously caused Keep-Alive to fail, and one patch was reverted, but I'm thinking now that one (or more) of the following needs to be looked at (thanks @randall for making this list):

CHANGELOG-8.0.0:  #3325 - Adding proxy.config.http.allow_half_open
CHANGELOG-8.0.0:  #3714 - Two more places to check if attempting half_closed connection logic is feasible
CHANGELOG-8.0.0:  #4213 - Disable the HttpSM half open logic if the underlying transport is TLS
CHANGELOG-8.0.1:  #4267 - Set log code when closing half open connections.
CHANGELOG-8.0.2:  #4716 - Revert "Two more places to check whether attempting half_closed connecton logic is feasible"
zwoop commented 4 years ago

@thelounge-zz If you could try this package, that would really help us understand if we're on the right track with this:

https://ci.trafficserver.apache.org/src/tests/trafficserver-8.0.6-rc0.tar.bz2

Note that this is not a public / official release, this will not be the final version.

thelounge-zz commented 4 years ago

sadly the same shit ("CONFIG proxy.config.http.allow_half_open INT 2" set with your tarball) rolling back to 8.0.1 :-( :-(

Oct 28 23:09:29 proxy traffic_manager[949120]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE Oct 28 23:09:29 proxy traffic_manager[949120]: traffic_server: received signal 6 (Aborted) Oct 28 23:09:29 proxy traffic_manager[949120]: traffic_server - STACK TRACE: Oct 28 23:09:29 proxy traffic_server[949127]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa4)[0x558092b558aa] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/libpthread.so.0(+0x12c60)[0x7fb3d08bec60] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/libc.so.6(gsignal+0x145)[0x7fb3d071de35] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/libc.so.6(abort+0x127)[0x7fb3d0708895] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/trafficserver/libtscore.so.8(_Z17ink_mutex_destroyP15pthread_mutex_t+0x0)[0x7fb3d12a114c] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/trafficserver/libtscore.so.8(ink_freelist_init_ops+0x0)[0x7fb3d12a20c4] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x3c)[0x558092b2ce5e] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xd3)[0x558092afa657] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x92)[0x558092a35720] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9LinklinkEEPiS5+0x62)[0x558092a35808] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x167)[0x558092a35a53] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/bin/traffic_server(+0x96446)[0x558092a34446] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/libpthread.so.0(+0x84c0)[0x7fb3d08b44c0] Oct 28 23:09:29 proxy traffic_manager[949120]: /usr/lib64/libc.so.6(clone+0x43)[0x7fb3d07e2553] Oct 28 23:09:32 proxy traffic_server[949205]: NOTE: --- traffic_server Starting --- Oct 28 23:09:32 proxy traffic_server[949205]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 8.0.6 - (build # 102822 on Oct 28 2019 at 22:57:56) Oct 28 23:09:32 proxy traffic_server[949205]: NOTE: RLIMIT_NOFILE(7):cur(675000),max(675000)

zwoop commented 4 years ago

The change here doesn't need that setting, but seems like it shouldn't matter if you set it to "2" though. This reproduces this fast on your site? Or do you have a way to trigger it ?

thelounge-zz commented 4 years ago

well, we host some hundret websites and it happens pretty fast but i would sell my soul for a clear "send this request and boom" but i don't have a clue what is going on in the usual mixed real world traffic

no way to trigger it oustide production :-(

zwoop commented 4 years ago

@thelounge-zz We'll figure it out, the help you provide here is crucial. I'll have another tar-ball to try in a bit, but can you confirm that you have HTTP/2 enabled? I think that this only (or mostly) happens with HTTP/2 connections.

thelounge-zz commented 4 years ago

surely and the brotli-compression is done by ATS too

[root@proxy:/etc/trafficserver]$ cat plugin.config compress.so /etc/trafficserver/compress.config header_rewrite.so /etc/trafficserver/header_rewrite.config

HTTP/2.0 200 OK date: Tue, 29 Oct 2019 15:38:46 GMT strict-transport-security: max-age=31536000 expires: Thu, 19 Nov 1981 08:52:00 GMT cache-control: private etag: d0aade35746c42579be2f659f92960ba-df last-modified: Thu, 19 Dec 2013 18:26:50 GMT vary: Accept-Encoding,User-Agent x-content-type-options: nosniff x-response-time: D=5802 us content-type: text/html; charset=ISO-8859-1 content-encoding: br age: 0 X-Firefox-Spdy: h2 Content-Security-Policy-Report-Only: worker-src 'none'; report-uri about:blank

zwoop commented 4 years ago

@thelounge-zz can you try https://ci.trafficserver.apache.org/src/tests/trafficserver-8.0.6-rc-zz1.tar.bz2 please ?

thelounge-zz commented 4 years ago

@thelounge-zz can you try https://ci.trafficserver.apache.org/src/tests/trafficserver-8.0.6-rc-zz1.tar.bz2 please ?

for 10 minutes up on the production server

"ab -c 250 -n 500000" and "ab -c 250 -n 500000 -k" against a 650 byte small gif image are fine (in 8.0.1 keep-alive is terrible broken which is besides the H2 security issues one reason i want to get rid of it better sooner than later) and until now no crash - knocking wood - it's 9 PM here and load is very low

thelounge-zz commented 4 years ago

@thelounge-zz can you try https://ci.trafficserver.apache.org/src/tests/trafficserver-8.0.6-rc-zz1.tar.bz2 please ?

i should habe kept my mouth shut :-(

Oct 29 21:12:39 proxy traffic_manager[1072131]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE Oct 29 21:12:39 proxy traffic_manager[1072131]: traffic_server: received signal 6 (Aborted) Oct 29 21:12:39 proxy traffic_manager[1072131]: traffic_server - STACK TRACE: Oct 29 21:12:39 proxy traffic_server[1072137]: Fatal: HttpSM.cc:2516: failed assertion magic == HTTP_SM_MAGIC_ALIVE Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa4)[0x563a4e753902] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/libpthread.so.0(+0x12c60)[0x7f05cb91cc60] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/libc.so.6(gsignal+0x145)[0x7f05cb77be35] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/libc.so.6(abort+0x127)[0x7f05cb766895] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/trafficserver/libtscore.so.8(_Z17ink_mutex_destroyP15pthread_mutex_t+0x0)[0x7f05cc2ff14c] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/trafficserver/libtscore.so.8(ink_freelist_init_ops+0x0)[0x7f05cc3000c4] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0x3c)[0x563a4e72ae86] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0xd3)[0x563a4e6f866f] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x92)[0x563a4e633720] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9LinklinkEEPiS5+0x62)[0x563a4e633808] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x167)[0x563a4e633a53] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/bin/traffic_server(+0x96446)[0x563a4e632446] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/libpthread.so.0(+0x84c0)[0x7f05cb9124c0] Oct 29 21:12:39 proxy traffic_manager[1072131]: /usr/lib64/libc.so.6(clone+0x43)[0x7f05cb840553] Oct 29 21:12:42 proxy traffic_server[1072623]: NOTE: --- traffic_server Starting --- Oct 29 21:12:42 proxy traffic_server[1072623]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 8.0.6 - (build # 102920 on Oct 29 2019 at 20:57:03) Oct 29 21:12:42 proxy traffic_server[1072623]: NOTE: RLIMIT_NOFILE(7):cur(675000),max(675000)

zwoop commented 4 years ago

aaaaargh...

thelounge-zz commented 4 years ago

what i see often in the logs with 8.0.1 are the following error messages - can this be related somehow and even if not - is i a good idea to flood logs that way - if a would be an attacker i would use mixed load (many connections without any data and close them, many connections per second, expensive connections, anything which can flood logs and so fill the disk and/or produce load)

[Oct 31 12:59:41.841] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177112 stream_id=43 recv headers compression error [Oct 31 12:59:41.885] {0x7fed381b6700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177174 stream_id=1 recv headers compression error [Oct 31 12:59:41.900] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177175 stream_id=1 recv headers compression error [Oct 31 12:59:41.948] {0x7fed384bc700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177176 stream_id=1 recv headers compression error [Oct 31 12:59:41.963] {0x7fed383ba700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177177 stream_id=1 recv headers compression error [Oct 31 12:59:42.033] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177178 stream_id=1 recv headers compression error [Oct 31 12:59:42.050] {0x7fed381b6700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177179 stream_id=1 recv headers compression error [Oct 31 12:59:52.258] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177182 stream_id=3 recv headers compression error [Oct 31 12:59:52.301] {0x7fed383ba700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177188 stream_id=1 recv headers compression error [Oct 31 12:59:52.331] {0x7fed384bc700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177189 stream_id=1 recv headers compression error [Oct 31 12:59:52.346] {0x7fed383ba700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177190 stream_id=1 recv headers compression error [Oct 31 12:59:52.388] {0x7fed383ba700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177191 stream_id=1 recv headers compression error [Oct 31 12:59:52.419] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177192 stream_id=1 recv headers compression error [Oct 31 12:59:52.434] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177193 stream_id=1 recv headers compression error [Oct 31 13:03:17.988] {0x7fed381b6700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177273 stream_id=15 recv headers compression error [Oct 31 13:03:18.035] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177385 stream_id=1 recv headers compression error [Oct 31 13:03:18.054] {0x7fed381b6700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177386 stream_id=1 recv headers compression error [Oct 31 13:03:18.102] {0x7fed383ba700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177387 stream_id=1 recv headers compression error [Oct 31 13:03:18.119] {0x7fed382b8700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177388 stream_id=1 recv headers compression error [Oct 31 13:03:18.167] {0x7fed381b6700} ERROR: HTTP/2 connection error client_ip=194.208.63.67 session_id=177389 stream_id=1 recv headers compression error

zwoop commented 4 years ago

Another thought I had is that maybe the half-opened was a red herring, and this issue has been in 8.0.x all along (introduced somewhere between 7.1.x and the v8.0.0 release). My thinking is that with v8.0.0, we broke Keep-Alive, so maybe in that version, this can't trigger because the connections are closed so fast. But once fixed, the error occurs again.

Sudheer suggested a test, where if you turn off KA, this assertion would not happen (or at least not nearly as often). @thelounge-zz would you be able to test that? It's obviously not a solution, but helps us point in a new direction to look.

CONFIG proxy.config.http.keep_alive_enabled_in INT 0

I don't think proxy.config.http.keep_alive_enabled_out is the issue, but could possibly test with that turned off as well.

thelounge-zz commented 4 years ago

hmm - but wouldn't it than also be triggered with "ab -c 150 -n 250000 -k" which is broken in 8.0.1 (meaning it fails without errors on the server side as far as i know) and works with later versions?

zwoop commented 4 years ago

Hmmm, I've never been able to reproduce with a simple "ab" in my testing :-/. With or without "-k".

thelounge-zz commented 4 years ago

well, that's why i wonder how it can be keep-alive, however i iwill gibe 8.0.5 with disabled keep alive a try but not now, 5:00 AM here and busy with coding stuff behind the proxy :-)

thelounge-zz commented 4 years ago

indeed, no crash for 7 hours, well - given that we now have https eveyrwhere non-keep alive is really expensive when each and every image, javascript and so on triggers a new handshake :-(

CONFIG proxy.config.http.keep_alive_enabled_in INT 0 CONFIG proxy.config.http.keep_alive_enabled_out INT 0

maybe that fu***ers in case of enabled keep alive would be the trigger?

[Nov 19 06:36:17.956] {0x7fcd11573700} ERROR: HTTP/2 stream error client_ip=119.94.111.110 session_id=13791 stream_id=37 refused to create new stream, because ua_session is in half_close state [Nov 19 06:36:17.957] {0x7fcd11573700} ERROR: HTTP/2 connection error client_ip=119.94.111.110 session_id=13791 stream_id=37 window update stream invalid id [Nov 19 06:41:59.381] {0x7fcd117b7700} ERROR: HTTP/2 session error client_ip=130.105.99.84 session_id=14214 closing a connection, because its stream error rate (1.000000) is too high [Nov 19 09:13:58.493] {0x7fcd117b7700} ERROR: HTTP/2 stream error client_ip=89.144.211.115 session_id=33516 stream_id=19 refused to create new stream, because ua_session is in half_close state [Nov 19 09:13:58.493] {0x7fcd117b7700} ERROR: HTTP/2 stream error client_ip=89.144.211.115 session_id=33516 stream_id=21 refused to create new stream, because ua_session is in half_close state [Nov 19 09:13:58.493] {0x7fcd117b7700} ERROR: HTTP/2 session error client_ip=89.144.211.115 session_id=33516 closing a connection, because its stream error rate (0.222222) is too high [Nov 19 09:56:53.512] {0x7fcd11573700} ERROR: HTTP/2 stream error client_ip=86.8.200.129 session_id=39448 stream_id=5 refused to create new stream, because ua_session is in half_close state [Nov 19 09:56:53.512] {0x7fcd11573700} ERROR: HTTP/2 session error client_ip=86.8.200.129 session_id=39448 closing a connection, because its stream error rate (0.500000) is too high [Nov 19 10:31:05.506] {0x7fcd118b9700} ERROR: HTTP/2 stream error client_ip=123.194.156.154 session_id=45129 stream_id=3 refused to create new stream, because ua_session is in half_close state [Nov 19 10:31:05.506] {0x7fcd118b9700} ERROR: HTTP/2 session error client_ip=123.194.156.154 session_id=45129 closing a connection, because its stream error rate (1.000000) is too high [Nov 19 11:02:46.782] {0x7fcd119bb700} ERROR: HTTP/2 stream error client_ip=80.110.104.188 session_id=50106 stream_id=17 refused to create new stream, because ua_session is in half_close state [Nov 19 11:02:46.782] {0x7fcd119bb700} ERROR: HTTP/2 session error client_ip=80.110.104.188 session_id=50106 closing a connection, because its stream error rate (1.000000) is too high

zwoop commented 4 years ago

@thelounge-zz Hey, don't know if you are still with us, or if you gave up, but any chance you can try this patch? @masaori335 found that we missed this commit in the 8.0.1 release, I don't know that it will change anything, but at this point, it's worth a shot.

commit 4cdb42b4b2cf175c3eb0ff4f567a3394ce37ff66
Author:     Susan Hinrichs <shinrich@oath.com>
AuthorDate: Thu Aug 2 13:44:54 2018 +0000
Commit:     Leif Hedstrom <zwoop@apache.org>
CommitDate: Mon Mar 30 09:05:36 2020 -0600

    Remove the ignore_keep_alive method entirely

    (cherry picked from commit da9edd5feba06da6ebbe47eaf9696c37c5483fa1)

diff --git a/proxy/ProxyClientTransaction.h b/proxy/ProxyClientTransaction.h
index 12fcf30ca..86f7b6528 100644
--- a/proxy/ProxyClientTransaction.h
+++ b/proxy/ProxyClientTransaction.h
@@ -196,12 +196,6 @@ public:
   {
   }

-  virtual bool
-  ignore_keep_alive()
-  {
-    return true;
-  }
-
   virtual void destroy();

   virtual void transaction_done() = 0;
diff --git a/proxy/http/Http1ClientTransaction.h b/proxy/http/Http1ClientTransaction.h
index 0251b2c1b..8a876660f 100644
--- a/proxy/http/Http1ClientTransaction.h
+++ b/proxy/http/Http1ClientTransaction.h
@@ -81,12 +81,6 @@ public:

   void release(IOBufferReader *r) override;

-  bool
-  ignore_keep_alive() override
-  {
-    return false;
-  }
-
   bool allow_half_open() const override;

   void set_parent(ProxyClientSession *new_parent) override;
diff --git a/proxy/http/HttpTransact.cc b/proxy/http/HttpTransact.cc
index 562952179..0a7d71010 100644
--- a/proxy/http/HttpTransact.cc
+++ b/proxy/http/HttpTransact.cc
@@ -5451,7 +5451,7 @@ HttpTransact::initialize_state_variables_from_request(State *s, HTTPHdr *obsolet
   }

   // If this is an internal request, never keep alive
-  if (!s->txn_conf->keep_alive_enabled_in || (s->state_machine->ua_txn && s->state_machine->ua_txn->ignore_keep_alive())) {
+  if (!s->txn_conf->keep_alive_enabled_in) {
     s->client_info.keep_alive = HTTP_NO_KEEPALIVE;
   } else if (vc && vc->get_is_internal_request()) {
     // Following the trail of JIRAs back from TS-4960, there can be issues with
diff --git a/proxy/http2/Http2Stream.h b/proxy/http2/Http2Stream.h
index 3be1057a5..68099d246 100644
--- a/proxy/http2/Http2Stream.h
+++ b/proxy/http2/Http2Stream.h
@@ -142,13 +142,6 @@ public:
   void signal_write_event(bool call_update);
   void reenable(VIO *vio) override;
   void transaction_done() override;
-  bool
-  ignore_keep_alive() override
-  {
-    // If we return true here, Connection header will always be "close".
-    // It should be handled as the same as HTTP/1.1
-    return false;
-  }

   void restart_sending();
   void push_promise(URL &url, const MIMEField *accept_encoding);
thelounge-zz commented 4 years ago

sorry, for now i stay with KeepAlive disabled, a ton of other work

BTW: https://trafficserver.apache.org/downloads Apache Traffic Server v8.0.6 was released on August 20th, 2019

this is nonsense, 8.0.5 was released at that time

thelounge-zz commented 4 years ago

when you missed a commit i don't get why it's not part of https://raw.githubusercontent.com/apache/trafficserver/8.0.x/CHANGELOG-8.0.7 more than a year later

Vijay-1 commented 3 years ago

@masaori335 @zwoop

Got this core in our production with 8.0.8 (gdb) bt

0 0x00002b6d5a1a5337 in raise () from /lib64/libc.so.6

1 0x00002b6d5a1a6a28 in abort () from /lib64/libc.so.6

2 0x00002b6d57b85a1b in ink_abort (message_format=message_format@entry=0x2b6d57c03c67 "%s:%d: failed assertion %s") at ink_error.cc:99

3 0x00002b6d57b82675 in _ink_assert (expression=expression@entry=0x76fc4e "magic == HTTP_SM_MAGIC_ALIVE", file=file@entry=0x76f61e "HttpSM.cc",

line=line@entry=2516) at ink_assert.cc:37

4 0x0000000000506107 in HttpSM::main_handler (this=, event=, data=) at HttpSM.cc:2516

5 0x0000000000555221 in handleEvent (data=0x2b6da013eeb0, event=2301, this=)

at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160

6 HttpTunnel::main_handler (this=0x2b6da013eeb0, event=, data=) at HttpTunnel.cc:1692

7 0x0000000000752bb2 in handleEvent (data=0x2b6d941fd020, event=103, this=) at I_Continuation.h:160

8 EThread::process_event (this=this@entry=0x2b6d61b3b010, e=e@entry=0x2b6d941fd020, calling_code=103) at UnixEThread.cc:131

9 0x000000000075347e in EThread::process_queue (this=this@entry=0x2b6d61b3b010, NegativeQueue=NegativeQueue@entry=0x2b6d7892ee70,

ev_count=ev_count@entry=0x2b6d7892ee6c, nq_count=nq_count@entry=0x2b6d7892ee68) at UnixEThread.cc:170

10 0x0000000000753a68 in EThread::execute_regular (this=0x2b6d61b3b010) at UnixEThread.cc:230

11 0x00000000007525aa in spawn_thread_internal (a=0x2a8ff60) at Thread.cc:85

12 0x00002b6d59537e65 in start_thread () from /lib64/libpthread.so.0

13 0x00002b6d5a26d88d in clone () from /lib64/libc.so.6

(gdb)

Was able to verify 5934. Back ported this to 8.0.8 and verified.

masaori335 commented 3 years ago

Let me close this issue as fixed. Because @Vijay-1's backtrace seems exactly the same as the original report and Bryan's one. He made sure #5934 fixed this issue. It's backported to 8.1.x and 9.0.x. ( released by 8.1.0 and 9.0.0 ).

Please reopen this issue if anybody still has the crash with 8.1.0 and later.

Note: it's possible the same crash could happen if there is another scheduled but orphaned VC_EVENT_WRITE_COMPLETE (103) event.

masaori335 commented 3 years ago

@thelounge-zz please give a try 8.1.0, 8.1.1, or 9.0.0.

thelounge-zz commented 3 years ago

seems to be fixed in 8.1.x