Closed moisesguimaraes closed 4 years ago
Test failed cause TLS tests take too long to execute.
cc @dormando -- we've got the TLS tests seeming to just hang for > 10 minutes on a stock amd64 platform (and thus Travis killing our builds because they appear idle for 10+ minutes); is that something you've seen before? Any ideas what we can do to get more information about what's happening?
@tianon got some logs? You can check our repo for the travis config for TLS tests... it downloads and builds an openssl because of a minimum version issue, so the tests do take a long time. I haven't see any hang myself though.
I ran the build locally so I could let it sit much longer, and got the following logs: (right after ok 11 - strtoull
is where it hangs for a long time before that eventual alarm line)
I've got a second build running with IO::Socket::SSL
installed now, but I think those test failures are unrelated to the hang -- it's hanging in the same spot (I'll report back when it's complete).
It took forever, but it did "succeed", even though the SSL tests all failed...
Need to roll back through and make that less jank... With TLS compiled in the test suite runs twice, with the first failing halfway through as you can see.
Also another issue with the crufty timeout.c daemon manager I think; you see the "failed to start" errors, those might be timing out. I'll go on a wild guess that memcached is crashing right away during the TLS specific tests.
I'd recommend a couple things, since I don't have access to any of this:
1) You can use make test_basic_tls
to shorten things a bit and get a better PASS/FAIL while figuring this out.
2) just try starting memcached-debug with -Z -o ssl_chain_cert=server_crt.pem -o ssl_key=server_key.pem
added (those files are in the t/ directory)
3) SSL_TEST=1 prove t/00-startup.t
then checking dmesg/etc might be useful here.
it's probably a simple error... the somewhat crufty test system is unfortunately hiding in this case, sorry :(
Huh, starting memcached
(or memcached-debug
) with -Z
exits immediately with an exit code of 64 and no output: :confused:
root@98b54462bf60:/usr/src/memcached# ./memcached-debug -Z -o ssl_chain_cert=t/server_crt.pem -o ssl_key=t/server_key.pem
root@98b54462bf60:/usr/src/memcached# echo $?
64
Edit: same if I add -u
to make sure it doesn't run as root:
root@98b54462bf60:/usr/src/memcached# ./memcached-debug -u memcached -Z -o ssl_chain_cert=t/server_crt.pem -o ssl_key=t/server_key.pem
root@98b54462bf60:/usr/src/memcached# echo $?
64
The output of prove t/00-startup.t
is unfortunately not much more enlightening, but likely has the same root cause as the server start failing:
root@98b54462bf60:/usr/src/memcached# SSL_TEST=1 prove t/00-startup.t
t/00-startup.t .. 1/20
# Failed test 'Basic startup works'
# at t/00-startup.t line 13.
# got: 'Failed to startup/connect to memcached server. at t/00-startup.t line 10.
# '
# expected: ''
t/00-startup.t .. 3/20
# Failed test '-l 127.0.0.1 works'
# at t/00-startup.t line 23.
# got: 'Failed to startup/connect to memcached server. at t/00-startup.t line 21.
# '
# expected: ''
t/00-startup.t .. 4/20
# Failed test '-C works'
# at t/00-startup.t line 30.
# got: 'Failed to startup/connect to memcached server. at t/00-startup.t line 26.
# '
# expected: ''
t/00-startup.t .. 5/20
# Failed test '-b works'
# at t/00-startup.t line 37.
# got: 'Failed to startup/connect to memcached server. at t/00-startup.t line 33.
# '
# expected: ''
t/00-startup.t .. 6/20
# Failed test 'auto works'
# at t/00-startup.t line 45.
# got: 'Failed to startup/connect to memcached server. at t/00-startup.t line 41.
# '
# expected: ''
t/00-startup.t .. 7/20
# Failed test 'ascii works'
# at t/00-startup.t line 45.
# got: 'Failed to startup/connect to memcached server. at t/00-startup.t line 41.
# '
# expected: ''
t/00-startup.t .. 8/20
# Failed test 'binary works'
# at t/00-startup.t line 52.
# got: 'Failed to startup/connect to memcached server. at t/00-startup.t line 50.
# '
# expected: ''
Error loading the certificate chain: /usr/src/memcached/t/server_crt.pem
t/00-startup.t .. 9/20
# Failed test 'auto works'
# at t/00-startup.t line 57.
# got: 'Failed to startup/connect to memcached server. at t/00-startup.t line 55.
# '
# expected: ''
Error loading the certificate chain: /usr/src/memcached/t/server_crt.pem
t/00-startup.t .. 10/20
# Failed test 'ascii works'
# at t/00-startup.t line 62.
# got: 'Failed to startup/connect to memcached server. at t/00-startup.t line 60.
# '
# expected: ''
Error loading the certificate chain: /usr/src/memcached/t/server_crt.pem
t/00-startup.t .. 11/20
# Failed test 'binary works'
# at t/00-startup.t line 69.
# got: 'Failed to startup/connect to memcached server. at t/00-startup.t line 67.
# '
# expected: ''
Invalid value for binding protocol: http
-- should be one of auto, binary, or ascii
t/00-startup.t .. 12/20 Maximum connections must be greater than 0
t/00-startup.t .. 13/20 Maximum connections must be greater than 0
t/00-startup.t .. 14/20 Number of threads must be greater than 0
t/00-startup.t .. 15/20 # Looks like you planned 20 tests but ran 15.
# Looks like you failed 10 tests of 15 run.
t/00-startup.t .. Dubious, test returned 10 (wstat 2560, 0xa00)
Failed 15/20 subtests
Test Summary Report
-------------------
t/00-startup.t (Wstat: 2560 Tests: 15 Failed: 10)
Failed tests: 1, 3-11
Non-zero exit status: 10
Parse errors: Bad plan. You planned 20 tests but ran 15.
Files=1, Tests=15, 31 wallclock secs ( 0.02 usr 0.01 sys + 0.63 cusr 0.23 csys = 0.89 CPU)
Result: FAIL
Yeah; tests won't help you until we get that first thing figured out. The same command starts fine locally.
not sure what to try offhand first, I'm a bit busy and will have to circle back; you can try strace'ing it to see what it was doing before it died at least (ensure to include -f and -s 999 or whatnot)
Sure, totally fair! :+1: :heart:
I've started trying to dig through it, but here's the strace
from the memcached-debug
startup (nothing seems obvious to me near the end, but hopefully there's a clue in here somewhere):
When I use the non-debug variant, the strace
logs are much shorter and strangely more readable:
blah dammit. there's an if (settings.verbose) before an exit. I should've caught that.
remove strace, add -v, try again.
Hah! I certainly can't fault you for being human. :smile:
root@63267e7a34de:/usr/src/memcached# ./memcached-debug -v -u memcached -Z -o ssl_chain_cert=t/server_crt.pem -o ssl_key=t/server_key.pem
Error loading the certificate chain: t/server_crt.pem
ok now it's just eating the damn SSL_Error().
Can you try on a few platforms and see what the library differences are? like a local machine vs your docker thing here. I'll have to circle back to this. You might also try using the OpenSSL tooling to try parsing the cert files to see if they throw errors.
I applied the trailing patch here to just get a quick-n-dirty method of printing the SSL error and got the following:
root@3258c428917d:/usr/src/memcached# ./memcached-debug -v -u memcached -Z -o ssl_chain_cert=t/server_crt.pem -o ssl_key=t/server_key.pem
140130205252480:error:140AB18F:SSL routines:SSL_CTX_use_certificate:ee key too small:../ssl/ssl_rsa.c:310:
Error loading the certificate chain: t/server_crt.pem
I think it's probably the same root cause as https://bugs.debian.org/927461, namely that the openssl
package in Debian Buster and above now has stricter defaults for OpenSSL. As noted in that report, it's possible to do sed -i 's/SECLEVEL=2/SECLEVEL=1/g' /etc/ssl/openssl.cnf
and relax those requirements which does indeed fix my ability to run memcached-debug
-- do you think that's something that makes sense for us to apply globally in the image, or would you think it's something we should perhaps instead apply temporarily for running the memcached
tests and then revert afterwards so that users have to opt-in explicitly to these "less secure" keys?
diff --git a/tls.c b/tls.c
index 8e66a89..f21252c 100644
--- a/tls.c
+++ b/tls.c
@@ -7,6 +7,8 @@
#include <sysexits.h>
#include <sys/param.h>
+#include <openssl/err.h>
+
#ifndef MAXPATHLEN
#define MAXPATHLEN 4096
#endif
@@ -97,6 +99,7 @@ static bool load_server_certificates(char **errmsg) {
SSL_LOCK();
if (!SSL_CTX_use_certificate_chain_file(settings.ssl_ctx,
settings.ssl_chain_cert)) {
+ ERR_print_errors_fp(stderr);
snprintf(error_msg, errmax, "Error loading the certificate chain: %s\r\n",
settings.ssl_chain_cert);
success = false;
We should fix these test certs so they just work... I wish I'd have time to get more up to date buildbots so this wouldn't end up being surprises to people... but this is life with OpenSSL.
I added an issue on our end to fix the certs; I would certainly not advise globally lowering the SECLEVEL. If you're doing this just to build an image, you could do it temporarily while the tests run... though I also hope there's a way of doing it via environment rather than only via editing the global file there.
I couldn't find any way to convince OpenSSL to adjust it less dramatically than editing openssl.cnf
, so we modify it temporarily and restore it afterwards and it completes successfully and in a completely reasonable amount of time. :+1:
Thanks for kicking this off, @moisesguimaraes! (and as always, thanks a ton for your help and your upstream work, @dormando :heart:)
@tianon, do you know how long does it takes to see this change in dockerhub? Is it an automated process?
which memcache client support TLS
which memcache client support TLS
https://github.com/jaysonsantos/python-binary-memcached/releases
v0.29.0 of python-binary-memcached supports TLS.
Memcached support TLS connections since version 1.5.13[0] which needs an extra build option.
[0] https://github.com/memcached/memcached/wiki/TLS