New 5XX errors from Metabase API

cpan-testers / cpantesters-api

An API in to data held by CPAN Testers: Test reports and CPAN uploads

Other

4 stars 4 forks source link

New 5XX errors from Metabase API #16

Closed preaction closed 6 years ago

preaction commented 6 years ago

Andreas König is reporting a lot of 500 Internal Server Errors, which Fastly is saying started Nov 10 (above). Check the Metabase API logs to see what could be causing this, and add new logging if necessary.

preaction commented 6 years ago

Nigel Horne also reported some issues:

[error] DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::mysql::st execute failed: MySQL server has gone away [for Statement &quot;SELECT me.id, me.resource, me.fullname, me.email FROM metabase_user me WHERE ( resource = ? )&quot; with ParamValues: 0=&#39;metabase:user:30f4dfbe-2aae-11df-837a-5e0a49663a4f&#39;] at /home/cpantesters/perl5/bin/cpantesters-legacy-metabase line 93

preaction commented 6 years ago

This seems to have been fixed by restarting the services so they reconnect to the database. It seems like a minor hiccup perhaps caused by a restart or some other severing of the connection. We should set up some kind of auto-reconnect to fix these little hiccups.

preaction commented 6 years ago

This is absolutely not fixed: The mysql error log is filling up with

2017-11-13T18:41:31.018742Z 678207 [Note] Aborted connection 678207 to db: 'metabase' user: 'cpantesters' host: 'cpantesters3.dh.bytemark.co.uk' (Got an error reading communication packets)
2017-11-13T18:41:31.021920Z 678204 [Note] Aborted connection 678204 to db: 'cpanstats' user: 'cpantesters' host: 'cpantesters3.dh.bytemark.co.uk' (Got an error reading communication packets)
2017-11-13T18:41:31.023906Z 678212 [Note] Aborted connection 678212 to db: 'cpanstats' user: 'cpantesters' host: 'cpantesters3.dh.bytemark.co.uk' (Got an error reading communication packets)

All from cpantesters3.dh.bytemark.co.uk. I have Fastly configured now to only send to cpantesters1.barnyard.co.uk until we can figure this out (or to see if that box also demonstrates the problem).

preaction commented 6 years ago

Since this is localized to the one machine, I'm going to try using iperf to run some networking tests between the two machines. It's possible that something somewhere is dropping a lot of packets or otherwise interfering with the connection between the two machines.

preaction commented 6 years ago

api-1 is now stuck trying to install iperf, and is making me remove icinga2-common before I can install iperf. Hopefully this completes before I have to leave the coffee shop here...

preaction commented 6 years ago

This appeared to be from the server being overloaded: The MySQL client was taking too long to connect, and that was causing it to drop the connections. We've now moved the CPAN/BackPAN mirrors to another machine, as well as the main backend processes, and everything seems much more stable.