bigpresh / Dancer-Plugin-Database

Dancer::Plugin::Database - easy database support for Dancer applications
http://search.cpan.org/dist/Dancer-Plugin-Database
37 stars 36 forks source link

Failed test 'database_connection_failed hook fires' #102

Open eserte opened 2 years ago

eserte commented 2 years ago

t/01-basic.t seems to fail sometimes like this:

#   Failed test 'database_connection_failed hook fires'
#   at t/01-basic.t line 247.
#          got: ''
#     expected: '1'
# Looks like you failed 1 test of 43.
t/01-basic.t ...... 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/43 subtests 

This seems to be rare, by looking at http://matrix.cpantesters.org/?dist=Dancer2-Plugin-Database%202.17;reports=1#sl=7,1

gregoa commented 1 year ago

We're also seeing this in Debian: https://bugs.debian.org/923824

emollier commented 1 year ago

Looking into details, it seems the test is not running the hook because it doesn't enter the _get_connection function from Dancer::Plugin::Database::Core in the first place. A few well placed prints shown that sometimes a non-expired handle is obtained from the previous connection, making the function return early without attempting to get a new one, thus having no chance to branch to the hook.

If I force disconnect the existing handle and wait for expiration, then the test is reliably passing on my end; I'm not entirely certain of the relevance of the resulting test though:

--- libdancer2-plugin-database-perl.orig/t/lib/TestApp.pm
+++ libdancer2-plugin-database-perl/t/lib/TestApp.pm
@@ -209,6 +209,9 @@
     var connection_failed => 1;
 };
 get '/database_connection_failed_fires' => sub {
+    # Avoid catching an old handle which may not have expired yet
+    database()->disconnect;
+    sleep 0.2;
     # Give a ridiculous database filename which should never exist in order to
     # force a connection failure
     my $handle = database({ 

In hope this helps, Étienne.

bigpresh commented 1 year ago

A few well placed prints shown that sometimes a non-expired handle is obtained from the previous connection, making the function return early without attempting to get a new one

That's both interesting and worrying, as the intention of that test is to give new connection details for a connection that will always fail - so there should never have been a handle available. I shall investigate further!

bigpresh commented 1 year ago

So far, I've been unable to reproduce this. I've been running the test suite repeatedly in a tight loop, 500+ iterations so far and not a single failure, so I think there must be a bit more to it than just random chance. (This was on a Debian 11 box running perl 5.32, for reference.)

EDIT: also another 250 iterations on a Ubuntu 22.04.1 LTS box with perl 5.34, no failures there either.

I'll look at the build logs supplied and see if it's down to a particular perl version or particular DBI version.

bigpresh commented 1 year ago

Ooh - I can reproduce it on my workstation box, but very rarely.

With a serious thrashing, all 8 cores running in parallel, I recorded 2 failures out of 1504 test runs (0.132% failure rate) - that was the Ubuntu 22.04 box with perl 5.34. So I know it's reproducible on that box, at least!

emollier commented 1 year ago

For further context, this is observed in Debian testing (the upcoming Debian 12 bookworm) and Sid. Continuous integration for Debian 12 bookworm only kept around the last ten test logs, but two instances failed with this error, so we are observing a roughly 20% error rate now. This also matches my own observation when running this test a lot of times in Sid. Interestingly, the initial report on Debian side dates back from 2019, meaning something went repaired at some point, then broke again perhaps.

Continuous integration for Debian 11 bullseye recorded 0 error out of 21 runs for the past three years, matching what you could observe.

emollier commented 1 year ago

For information, the unstability caused some difficulties to the Debian release team during the stabilisation of bookworm; see Debian Bug#923824 message 26. I applied the patch for the Debian 13 release cycle.

If you have the opportunity to try Debian 12, you may be able to reproduce the bug with a much higher failure rate than on Ubuntu 22.04.