dragonresearch / rpki.net

Dragon Research Labs rpki.net RPKI toolkit
54 stars 30 forks source link

rpkid can't reestablish a connection to mysqld #283

Closed sraustein closed 11 years ago

sraustein commented 12 years ago

I noticed this on work0 as well, so I thought I'd report it. rpkid seems to have lost its database connection. However, mysqld seems to be running properly so far as I can tell. I can use the mysql cli to inspect the databases without error.

{{{ 2012-07-11 18:49:19 rpkid[15322]: Scheduling next cron run at 2012-07-11T18:51:19Z 2012-07-11 18:49:19 rpkid[15322]: Self SPARTA[1] polling parents 2012-07-11 18:49:19 rpkid[15322]: Sending "list" request to parent arin 2012-07-11 18:49:19 rpkid[15322]: Signing with cert issuer /commonName=SPARTA BPKI Resource Trust Anchor subject /commonName=52C71874B96B5E107F48085E7F951A57DDAED7E4 SKI 52:C7:18:74:B9:6B:5E:10:7F:48:08:5E:7F:95:1A:57:DD:AE:D7:E4 2012-07-11 18:49:20 rpkid[15322]: Received CMS cert issuer /commonName=arin TA subject /commonName=E599339AE37A97344A4719DCD976C4B87E838C32 SKI E5:99:33:9A:E3:7A:97:34:4A:47:19:DC:D9:76:C4:B8:7E:83:8C:32 2012-07-11 18:49:20 rpkid[15322]: Received CMS CRL issuer <rpki.x509.X501DN object at 0x28476d0> 2012-07-11 18:49:20 rpkid[15322]: CMS trusted cert issuer /commonName=elkins-dt.isi.edu BPKI Server Trust Anchor subject /commonName=elkins-dt.isi.edu BPKI Server Trust Anchor SKI 3F:48:A5:66:57:29:0C:01:6E:F4:47:C8:F6:FD:E3:38:AB:F6:DA:ED 2012-07-11 18:49:20 rpkid[15322]: CMS trusted cert issuer /commonName=SPARTA BPKI Resource Trust Anchor subject /commonName=SPARTA BPKI Resource Trust Anchor SKI 53:AB:72:51:3C:63:19:EE:F1:BC:2D:83:29:06:A9:59:4B:40:20:DC 2012-07-11 18:49:20 rpkid[15322]: CMS trusted cert issuer /commonName=SPARTA BPKI Resource Trust Anchor subject /commonName=arin TA SKI 11:D9:85:33:74:67:F7:56:D0:C8:CC:89:80:91:00:91:D9:E7:59:44 2012-07-11 18:49:20 rpkid[15322]: Parent arin agrees that SPARTA has SKI yryZulBe_oEX6ydPZTqUZO55Dio in resource class 6 2012-07-11 18:49:20 rpkid[15322]: Sweeping <rpki.left_right.parent_elt object at 0x26079d0> 2012-07-11 18:49:20 rpkid[15322]: Self SPARTA[1] updating children 2012-07-11 18:49:20 rpkid[15322]: Self SPARTA[1] updating ROAs 2012-07-11 18:49:20 rpkid[15322]: Signing with cert issuer /commonName=elkins-dt.isi.edu BPKI Server Trust Anchor subject /commonName=elkins-dt.isi.edu BPKI rpkid EE SKI 79:AE:36:20:C2:F8:5C:C6:07:5E:6C:2C:6F:E3:2A:50:7F:8F:50:70 2012-07-11 18:49:20 rpkid[15322]: : Error on HTTP client connection localhost:4403 <class 'rpki.exceptions.HTTPRequestFailed'> HTTP request failed with status 500, reason Unhandled exception OperationalError: (2006, 'MySQL server has gone away') 2012-07-11 18:49:20 rpkid[15322]: : Closing due to error 2012-07-11 18:49:20 rpkid[15322]: : Returning exception HTTPRequestFailed("HTTP request failed with status 500, reason Unhandled exception OperationalError: (2006, 'MySQL server has gone away')",) to caller: HTTP request failed with status 500, reason Unhandled exception OperationalError: (2006, 'MySQL server has gone away') 2012-07-11 18:49:20 rpkid[15322]: Could not fetch ROA requests for SPARTA, skipping: HTTP request failed with status 500, reason Unhandled exception OperationalError: (2006, 'MySQL server has gone away') }}}

Trac ticket #267 component rpkid priority minor, owner sra, created by melkins on 2012-07-11T18:57:19Z, last modified 2012-11-28T04:43:52Z

sraustein commented 12 years ago

Every case of this I've ever seen really has been the MySQL server going away or becoming unreachable (eg, something deletes /tmp/mysql.sock, which seems to happen occasionally on some of Randy's machines due to cleanup scripts gone amok or some such). work0's logs show the last bout of this on 21 June, which, not by coincidence, is when /bin/ps gauxww shows the current mysqld process as having started.

Not saying you're wrong, but so far all the evidence I have says rpkid is reporting this condition correctly.

Trac comment by sra on 2012-07-11T19:37:14Z

sraustein commented 12 years ago

Just checked, and you are correct that the mysqld process was restarted on July 9th on my server. I guess the question is if mysqld is restarted, should I expect that rpkid can recover with a new database connection?

Trac comment by melkins on 2012-07-11T19:51:38Z

sraustein commented 12 years ago

Sigh. Apparently somebody cleverly changed the MySQLdb API so that the function which used to reconnect now just whines unless given an extra parameter indicating that one really did want to reconnect. Twits.

{{{ $ pydoc MySQLdb.connections.Connection.ping

MySQLdb.connections.Connection.ping = ping(...) Checks whether or not the connection to the server is working. If it has gone down, an automatic reconnection is attempted.

This function can be used by clients that remain idle for a
long while, to check whether or not the server has closed the
connection and reconnect if necessary.

New in 1.2.2: Accepts an optional reconnect parameter. If True,
then the client will attempt reconnection. Note that this setting
is persistent. By default, this is on in MySQL<5.0.3, and off
thereafter.

Non-standard. You should assume that ping() performs an
implicit rollback; use only when starting a new transaction.
You have been warned.

}}}

Trac comment by sra on 2012-07-11T20:04:03Z

sraustein commented 12 years ago

Ignore comment 3. rpki.sql already sets the required flag and has for years.

Trac comment by sra on 2012-07-11T20:20:15Z

sraustein commented 11 years ago

Addressed in [4632].

Trac comment by sra on 2012-11-28T04:43:52Z

sraustein commented 11 years ago

Closed with resolution fixed