MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

Livy sessions can acquire stale MySQL connection #304

Closed ghukill closed 6 years ago

ghukill commented 6 years ago

Given enough time of inactivity, Livy's DB connection will become stale, resulting in failures for any Spark related activities.

e.g.

Traceback (most recent call last):
  File "/usr/local/anaconda/envs/combine/lib/python3.5/site-packages/django/db/backends/utils.py", line 65, in execute
    return self.cursor.execute(sql, params)
  File "/usr/local/anaconda/envs/combine/lib/python3.5/site-packages/django/db/backends/mysql/base.py", line 101, in execute
    return self.cursor.execute(query, args)
  File "/usr/local/anaconda/envs/combine/lib/python3.5/site-packages/MySQLdb/cursors.py", line 250, in execute
    self.errorhandler(self, exc, value)
  File "/usr/local/anaconda/envs/combine/lib/python3.5/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
    raise errorvalue
  File "/usr/local/anaconda/envs/combine/lib/python3.5/site-packages/MySQLdb/cursors.py", line 247, in execute
    res = self._query(query)
  File "/usr/local/anaconda/envs/combine/lib/python3.5/site-packages/MySQLdb/cursors.py", line 411, in _query
    rowcount = self._do_query(q)
  File "/usr/local/anaconda/envs/combine/lib/python3.5/site-packages/MySQLdb/cursors.py", line 374, in _do_query
    db.query(q)
  File "/usr/local/anaconda/envs/combine/lib/python3.5/site-packages/MySQLdb/connections.py", line 292, in query
    _mysql.connection.query(self, query)
_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')

With the relatively new automatic restarting of Livy, this issue will address stale connections to the database.

Related re: background tasks: https://github.com/WSULib/combine/issues/224

ghukill commented 6 years ago

Proposing to bump mysql's wait_timeout to maximum value of 31536000 (1 year).

As not a high volume site, and MySQL no longer high I/O, don't need to worry much about open connections.

With Spark and background tasks both opening connections to SQL, becomes complex to run connection.close() before anytime it might be accessed. It's outside of the normal Django request/response cycle, which its believed is where connections are opened/closed, and would be complex to address.

ghukill commented 6 years ago

Added to Combine-Playbook, updating /etc/mysql/my.cnf for builds:

[mysqld]
wait_timeout = 31536000
interactive_timeout = 31536000