apache / trafficserver

Apache Traffic Server™ is a fast, scalable and extensible HTTP/1.1 and HTTP/2 compliant caching proxy server.
https://trafficserver.apache.org/
Apache License 2.0
1.82k stars 804 forks source link

quiche CI build failing #10999

Closed brbzull0 closed 7 months ago

brbzull0 commented 9 months ago

It's been failing for quite some time now, this is the list of failing tests.

brbzull0 commented 9 months ago

active_timeout there seems to be a crash un rockylinux, could be related to some library version, etc:

[Jan 22 15:35:32.154] [ET_NET 10] DIAG: <QUICNetVConnection.cc:373 (handle_received_packet)> (v_quic_net) [] failed to process packet: -9
[Jan 22 15:35:32.154] [ET_NET 10] DIAG: <QUICNetVConnection.cc:664 (_handle_interval)> (quic_net) [] is_app=0 error_code=10 reason=
[Jan 22 15:35:32.154] [ET_NET 11] DIAG: <QUICNetVConnection.cc:373 (handle_received_packet)> (v_quic_net) [] failed to process packet: -9
[Jan 22 15:35:32.155] [ET_NET 11] DIAG: <QUICNetVConnection.cc:664 (_handle_interval)> (quic_net) [] is_app=0 error_code=10 reason=
[Jan 22 15:35:33.982] [ET_NET 10] DIAG: <QUICNetVConnection.cc:118 (free_thread)> (quic_net) [] Free connection
[Jan 22 15:35:33.989] [ET_NET 11] DIAG: <QUICNetVConnection.cc:118 (free_thread)> (quic_net) [] Free connection
[Jan 22 15:35:34.056] [ET_UDP 0] DIAG: <QUICNetVConnection.cc:102 (destroy)> (quic_net) [] Destroy connection
[Jan 22 15:35:34.056] [ET_UDP 0] DIAG: <QUICNetVConnection.cc:102 (destroy)> (quic_net) [] Destroy connection
[Jan 22 15:35:40.264] [ET_UDP 0] DIAG: <QUICPacketHandler.cc:267 (_recv_packet)> (quic_sec) [e91af5e3-224fe664] client initial dcid=0x224fe66471f09d1fbc1fc9611a6fbc891ff7
[Jan 22 15:35:40.265] [ET_UDP 0] DIAG: <QUICPacketHandler.cc:267 (_recv_packet)> (quic_sec) [e91af5e3-224fe664] client initial dcid=0x224fe66471f09d1fbc1fc9611a6fbc891ff7
[Jan 22 15:35:40.265] [ET_NET 0] DIAG: <QUICNextProtocolAccept.cc:53 (mainEvent)> (v_quic) [] event 202 netvc 0x7f1761243220
[Jan 22 15:35:40.265] [ET_NET 1] DIAG: <QUICNextProtocolAccept.cc:53 (mainEvent)> (v_quic) [] event 202 netvc 0x7f1761242bd0
[Jan 22 15:35:40.266] [ET_NET 0] DIAG: <QUICNetVConnection.cc:373 (handle_received_packet)> (v_quic_net) [] failed to process packet: -9
[Jan 22 15:35:40.266] [ET_NET 0] DIAG: <QUICNetVConnection.cc:664 (_handle_interval)> (quic_net) [] is_app=0 error_code=10 reason=
[Jan 22 15:35:40.266] [ET_NET 1] DIAG: <QUICNetVConnection.cc:373 (handle_received_packet)> (v_quic_net) [] failed to process packet: -9
[Jan 22 15:35:40.266] [ET_NET 1] DIAG: <QUICNetVConnection.cc:664 (_handle_interval)> (quic_net) [] is_app=0 error_code=10 reason=
[Jan 22 15:35:41.972] [ET_NET 0] DIAG: <QUICNetVConnection.cc:118 (free_thread)> (quic_net) [] Free connection
[Jan 22 15:35:42.003] [ET_NET 1] DIAG: <QUICNetVConnection.cc:118 (free_thread)> (quic_net) [] Free connection
[Jan 22 15:35:42.068] [ET_UDP 0] DIAG: <QUICNetVConnection.cc:102 (destroy)> (quic_net) [] Destroy connection
[Jan 22 15:35:42.068] [ET_UDP 0] DIAG: <QUICNetVConnection.cc:102 (destroy)> (quic_net) [] Destroy connection
traffic_server: received signal 11 (Segmentation fault)
traffic_server - STACK TRACE: 
/tmp/sandbox/active_timeout/ts/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xce)[0x8dc520]
/lib64/libpthread.so.0(+0x12cf0)[0x7f178a07ccf0]
/lib64/libc.so.6(cfree+0x21)[0x7f1789d40e31]
/tmp/sandbox/active_timeout/ts/bin/traffic_server(_Z8ats_freePv+0x28)[0x91700f]
/tmp/sandbox/active_timeout/ts/bin/traffic_server(_ZN16HttpConfigParamsD1Ev+0x2c)[0x8e4a1a]
/lib64/libc.so.6(+0x5126c)[0x7f1789cf626c]
/lib64/libc.so.6(on_exit+0x0)[0x7f1789cf63a0]
/lib64/libc.so.6(__libc_start_main+0xec)[0x7f1789cdfd8c]
/tmp/sandbox/active_timeout/ts/bin/traffic_server(_start+0x2e)[0x8dba7e]
brbzull0 commented 9 months ago

txn_type ( and some others ) seems to be related to bssl.

rocky8:

#0  __libc_write (nbytes=218, buf=0x7f1ff4029788, fd=54) at ../sysdeps/unix/sysv/linux/write.c:26
#1  __libc_write (fd=54, buf=0x7f1ff4029788, nbytes=218) at ../sysdeps/unix/sysv/linux/write.c:24
#2  0x0000000000b07734 in SocketManager::write (fd=54, buf=0x7f1ff4029788, size=218) at /home/jenkins/trafficserver/src/iocore/dns/../eventsystem/P_UnixSocketManager.h:142
#3  0x0000000000c6e65e in fastopen_bwrite (bio=0x7f1ff4029ef8, in=0x7f1ff4029788 "\026\003\001", insz=218) at /home/jenkins/trafficserver/src/iocore/net/BIO_fastopen.cc:131
#4  0x00007f2029ded799 in BIO_write (bio=0x7f1ff4029ef8, in=<optimized out>, inl=<optimized out>) at /root/build_h3_tools/boringssl/crypto/bio/bio.c:173
#5  0x00007f202a1bf789 in bssl::tls_flush_flight (ssl=0x7f1ff402afe8) at /root/build_h3_tools/boringssl/ssl/s3_both.cc:333
#6  0x00007f202a1b6a3a in bssl::ssl_run_handshake (hs=0x7f1ff402fd48, out_early_return=out_early_return@entry=0x7f202560266f) at /root/build_h3_tools/boringssl/ssl/handshake.cc:602
#7  0x00007f202a1d028a in SSL_do_handshake (ssl=0x7f1ff402afe8) at /root/build_h3_tools/boringssl/ssl/ssl_lib.cc:865
#8  SSL_do_handshake (ssl=0x7f1ff402afe8) at /root/build_h3_tools/boringssl/ssl/ssl_lib.cc:849
#9  0x0000000000c13f5e in SSLNetVConnection::_ssl_connect (this=0x7f202b0d83c0) at /home/jenkins/trafficserver/src/iocore/net/SSLNetVConnection.cc:2462
#10 0x0000000000c1155e in SSLNetVConnection::sslClientHandShakeEvent (this=0x7f202b0d83c0, err=@0x7f2025602fe8: 0) at /home/jenkins/trafficserver/src/iocore/net/SSLNetVConnection.cc:1608
#11 0x0000000000c105e1 in SSLNetVConnection::sslStartHandShake (this=0x7f202b0d83c0, event=1, err=@0x7f2025602fe8: 0) at /home/jenkins/trafficserver/src/iocore/net/SSLNetVConnection.cc:1272
#12 0x0000000000c0d548 in SSLNetVConnection::net_read_io (this=0x7f202b0d83c0, nh=0x7f202675aba0, lthread=0x7f202675a010) at /home/jenkins/trafficserver/src/iocore/net/SSLNetVConnection.cc:611
#13 0x0000000000c8c3da in NetHandler::process_ready_list (this=0x7f202675aba0) at /home/jenkins/trafficserver/src/iocore/net/NetHandler.cc:276
#14 0x0000000000c8c733 in NetHandler::waitForActivity (this=0x7f202675aba0, timeout=10000000) at /home/jenkins/trafficserver/src/iocore/net/NetHandler.cc:364
#15 0x0000000000cc6fd9 in EThread::execute_regular (this=0x7f202675a010) at /home/jenkins/trafficserver/src/iocore/eventsystem/UnixEThread.cc:299
#16 0x0000000000cc716f in EThread::execute (this=0x7f202675a010) at /home/jenkins/trafficserver/src/iocore/eventsystem/UnixEThread.cc:348
#17 0x0000000000cc5af1 in spawn_thread_internal (a=0x205beb0) at /home/jenkins/trafficserver/src/iocore/eventsystem/Thread.cc:68
#18 0x00007f20283e41ca in start_thread (arg=<optimized out>) at pthread_create.c:479
#19 0x00007f2028050e73 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
brbzull0 commented 7 months ago

stek_share:

[Mar 20 14:07:18.869] traffic_server NOTE: plugin.config loading ...
[Mar 20 14:07:18.869] traffic_server NOTE: loading plugin '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so'
[Mar 20 14:07:18.873] traffic_server ERROR: unable to load '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so': /opt/lib/libnuraft.so: undefined symbol: SSL_ctrl
[Mar 20 14:07:18.873] traffic_server FATAL: unable to load '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so': /opt/lib/libnuraft.so: undefined symbol: SSL_ctrl

Could be an issue in the way we build nuraft I did try that, using the same CI commands to build it and found the same error. Weird this does not happens in other CI's

duke8253 commented 7 months ago

stek_share:

[Mar 20 14:07:18.869] traffic_server NOTE: plugin.config loading ...
[Mar 20 14:07:18.869] traffic_server NOTE: loading plugin '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so'
[Mar 20 14:07:18.873] traffic_server ERROR: unable to load '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so': /opt/lib/libnuraft.so: undefined symbol: SSL_ctrl
[Mar 20 14:07:18.873] traffic_server FATAL: unable to load '/tmp/sandbox/stek_share/ts1/plugin/stek_share.so': /opt/lib/libnuraft.so: undefined symbol: SSL_ctrl

Could be an issue in the way we build nuraft I did try that, using the same CI commands to build it and found the same error. Weird this does not happens in other CI's

Yeah, I'm tinkering around to see I can get a nuraft build with boringssl, getting some build errors for now.

duke8253 commented 7 months ago

I have a fix for the stek_share plugin now, ~I'll reference a PR once that's ready.~ #11185

bneradt commented 7 months ago

I have a fix for the stek_share plugin now, ~I'll reference a PR once that's ready.~ #11185

Latest master quiche autests are passing now: https://ci.trafficserver.apache.org/view/master/job/master/job/quiche/315/consoleFull

Running Test stek_share:.. Passed
maskit commented 7 months ago

The cause of 0rtt test failure is probably https://github.com/apache/trafficserver/issues/11203

brbzull0 commented 7 months ago

Closing this issue. All fixed now. Thanks to everyone!