cyrusimap / cassandane

Other
6 stars 11 forks source link

Idle - there's a race condition in idled startup when running under valgrind #41

Closed brong closed 6 years ago

brong commented 6 years ago

brong@bat:~/src/cassandane$ sudo -u cyrus ./testrunner.pl -f pretty -j 2 --valgrind Idle Cyrus::Idle.basic_idled [ OK ] Cyrus::Idle.basic_noidled [ OK ] Cyrus::Idle.basic_abortedidled [ OK ] Cyrus::Idle.delivery_idled [ OK ] Cyrus::Idle.delivery_abortedidled [ OK ] Cyrus::Idle.disabled [ OK ] Cyrus::Idle.idled_default_timeout [ OK ] Cyrus::Idle.delivery_noidled [ OK ] Cyrus::Idle.shutdownfile_abortedidled [ OK ] Cyrus::Idle.shutdownfile_idled [ OK ] Cyrus::Idle.shutdownfile_noidled [ OK ] Cyrus::Idle.sigterm [ OK ] Cyrus::Idle.sigterm_many [ OK ]

brong@bat:~/src/cassandane$ sudo -u cyrus ./testrunner.pl -f pretty -j 8 --valgrind Idle Cyrus::Idle.basic_noidled [ OK ] Cyrus::Idle.disabled [ OK ] Cyrus::Idle.delivery_abortedidled [FAILED] Cyrus::Idle.basic_abortedidled [FAILED] Cyrus::Idle.basic_idled [ OK ] Cyrus::Idle.idled_default_timeout [ OK ] Cyrus::Idle.shutdownfile_abortedidled [FAILED] Cyrus::Idle.delivery_noidled [ OK ] Cyrus::Idle.delivery_idled [ OK ] Cyrus::Idle.shutdownfile_noidled [ OK ] Cyrus::Idle.sigterm [ OK ] Cyrus::Idle.shutdownfile_idled [ OK ] Cyrus::Idle.sigterm_many [ OK ]

There were 3 failures: 1) test_delivery_abortedidled(Cassandane::Cyrus::Idle) /tmpfs/cas/224407D1/conf/socket/idle exists and is a socket at /usr/share/perl5/Test/Unit/Exception.pm line 13. Test::Unit::Exception::throw_new(Test::Unit::Failure=HASH(0x51fcee8), "-package", "Cassandane::Cyrus::Idle", "-file", "Cassandane/Cyrus/Idle.pm", "-line", 109, "-object", Cassandane::Cyrus::Idle=HASH(0x52e0d30), ...) called at /usr/share/perl5/Test/Unit/Assert.pm line 85

2) test_basic_abortedidled(Cassandane::Cyrus::Idle) /tmpfs/cas/224407A1/conf/socket/idle exists and is a socket at /usr/share/perl5/Test/Unit/Exception.pm line 13. Test::Unit::Exception::throw_new(Test::Unit::Failure=HASH(0x51fcb88), "-package", "Cassandane::Cyrus::Idle", "-file", "Cassandane/Cyrus/Idle.pm", "-line", 109, "-object", Cassandane::Cyrus::Idle=HASH(0x52d9208), ...) called at /usr/share/perl5/Test/Unit/Assert.pm line 85

3) test_shutdownfile_abortedidled(Cassandane::Cyrus::Idle) /tmpfs/cas/224407C2/conf/socket/idle exists and is a socket at /usr/share/perl5/Test/Unit/Exception.pm line 13. Test::Unit::Exception::throw_new(Test::Unit::Failure=HASH(0x5c3fea0), "-package", "Cassandane::Cyrus::Idle", "-file", "Cassandane/Cyrus/Idle.pm", "-line", 109, "-object", Cassandane::Cyrus::Idle=HASH(0x52e67b0), ...) called at /usr/share/perl5/Test/Unit/Assert.pm line 85

... in all cases, it looks like the socket isn't up and running in time, which means we need to wait until idled is running in the tests.

ajaysusarla commented 6 years ago

This is a potential fix(althought I'm not sure if it is the right one). The tests pass on multiple runs(albeit with a delay) with this patch.

diff --git a/Cassandane/Cyrus/Idle.pm b/Cassandane/Cyrus/Idle.pm
index 6122949..c85a5dc 100644
--- a/Cassandane/Cyrus/Idle.pm
+++ b/Cassandane/Cyrus/Idle.pm
@@ -106,6 +106,8 @@ sub start_and_abort_idled
     # Let's check that our assumption is correct.
     xlog "check that idle left a socket lying around";
     my $idle_sock = $self->{instance}->{basedir} . "/conf/socket/idle";
+    sleep 1 while not -e $idle_sock;
+
     $self->assert( -S $idle_sock, "$idle_sock exists and is a socket");
 }
brong commented 6 years ago

It does half defeat the purpose of the assert afterwards, but yep - it's a reasonable thing to have.

elliefm commented 6 years ago

The "sleep until the socket has been created" should really be dropped in just after xlog "giving idled some time to start up"; and before we kill it.

The assert at the end is to make sure idled's filesystem state is still around after being killed, but to be correct we should also ensure that idled has finished starting (and created its socket) before we kill it. Will have a patch soon.

elliefm commented 6 years ago

What's a reasonable maximum time to spend waiting for the socket, I wonder? I don't want this thing just looping forever if something has gone wrong. Will start with 60s and see if that's long enough...

ajaysusarla commented 6 years ago

60 seconds sounds reasonable. A few of the Idle tests have sleeps in there already.

elliefm commented 6 years ago

Yeah 60 seconds seems to be enough for me too. For comparison, I also tested with only 10s delay and it was still failing. We can always tune later if it turns out inadequate somewhere :)