LibreCat / Catmandu-DBI

A Catmandu::Store plugin for DBI based interfaces
Other
0 stars 1 forks source link

DBD::mysql::db begin_work failed #32

Closed petrakohorst closed 2 years ago

petrakohorst commented 3 years ago

We are seeing the following error message every morning in a worker using publication->add:

"DBD::mysql::db begin_work failed: Turning off AutoCommit failed at [....]/LibreCat/local/lib/perl5/Catmandu/Store/DBI.pm line 117"

Restarting the worker will fix the problem, but only until the next morning. The worker itself keeps running and working fine. Only writing to the db fails.

Google led me to this bug report: https://github.com/perl5-dbi/DBD-mysql/issues/202 which, unfortunately, does not have a solution yet.

Any idea on how to fix this? (As a temporary fix we are restarting workers nightly via cron.)

nicolasfranck commented 3 years ago

I'm not sure, but I think we should stop storing database handles (even in state variables!), and use statements like connect_cached or prepare_cached so that the underlying can invalidate handles properly. Where exactly do you see that happen in LibreCat?

petrakohorst commented 3 years ago

This is an error in our Orcid Worker.

The worker automatically imports new records from our users' Orcid profiles into our local database. Orcid sends a ping to one of our routes when a certain Orcid profile has been updated and this route then calls the worker.

The worker itself keeps running fine, even with the error occurring. Only this part (see below) fails over night. And it's neither the on_validation_error, nor the on_error... it's the catch block that's executed.

try {
  h->hook('import-new-' . $source)->fix_around(
    $record,
    sub {
      publication->add(
        $record,
        on_validation_error => sub {
          my ($error_rec, $errors) = @_;
          $orcid_logger->error("$orcid: Validation error: " . to_yaml($errors));
          $orcid_logger->error("$orcid: Error Rec: " . to_yaml($error_rec));
        },
        on_error => sub {
          my ($error_rec, $errors) = @_;
          $orcid_logger->error("$orcid: Other error: " . to_yaml($errors));
          $orcid_logger->error("$orcid: other error rec: " . to_yaml($error_rec));
        }
      );
    }
  );
}
catch {
  $orcid_logger->error("$orcid: Import Error: There was an error adding the rec with message $_.");
  $orcid_logger->error("$orcid: Error rec was: " . to_yaml($record));
};
nics commented 3 years ago

Try to make the trycatch block as small as possible by only wrapping the publication->add. Maybe you'll get better errors then.

nicolasfranck commented 3 years ago

It can also be caused by the age of the database handler, which is stored in a Moo attribute and not invalidated by DBI itself. That is why I proposed to use connect_cached. I'm working on that btw

nicolasfranck commented 3 years ago

Mm, DBI->connect_cached cannot resolve this: its cache key depends on all connection parameters, and that changes during a transaction (AutoCommit is false during a transaction).

It seems like this only happens for the DBD::mysql. Mysql has a connection attribute called mysql_auto_reconnect, which should trigger an auto reconnect, but that does not seems to work.

In this line https://github.com/LibreCat/Catmandu-DBI/blob/master/lib/Catmandu/Store/DBI.pm#L86 I let DBI::mysql do the reconnection by itself. For other database drivers that should work.

Maybe remove that one line, and try to do the reconnection for mysql too?

nicolasfranck commented 3 years ago

This triggers the same error:

use Catmandu::Sane;
use DBI;

my $dbh = DBI->connect(
    "dbi:mysql:database=librecat_main",
    "librecat",
    "librecat",
    +{
        AutoCommit => 1,
        RaiseError => 1,
        mysql_auto_reconnect => 1,
        mysql_enable_utf8 => 1
    }
);

$dbh->disconnect;
$dbh->begin_work;
nicolasfranck commented 3 years ago

@petrakohorst another solution may also be, to update these values in /etc/my.cnf:

# way too small!
wait_timeout=10
connect_timeout=10

What probably happens during a transaction:

DBD::mysql has a "reconnect" feature that handles the so-called MySQL "morning bug": If the server has disconnected, most probably due to a timeout, then by default the driver will reconnect and attempt to execute the same SQL statement again. However, this behaviour is disabled when AutoCommit is off: Otherwise the transaction state would be completely unpredictable after a reconnect.

So what you can also do:

nicolasfranck commented 3 years ago

@petrakohorst I hope https://github.com/LibreCat/Catmandu-DBI/tree/0.10.1 solves your issue? Options timeout and reconnect_timeout have been replaced (they generate a warning if you call their methods) by a proper reconnect on failure. The only time it does not reconnect is during a transaction, but that only happens if there is too much time between actions inside the transaction blok.

vpeil commented 2 years ago

@petrakohorst I hope https://github.com/LibreCat/Catmandu-DBI/tree/0.10.1 solves your issue? Options timeout and reconnect_timeout have been replaced (they generate a warning if you call their methods) by a proper reconnect on failure. The only time it does not reconnect is during a transaction, but that only happens if there is too much time between actions inside the transaction blok.

yes, this version works for us perfectly.

Hence, I'm closing this issue.