MetricsGrimoire / MailingListStats

Mailing List Stats is a command line based tool used to analyze mboxes
http://metricsgrimoire.github.com/MailingListStats/
GNU General Public License v2.0
38 stars 25 forks source link

MySQL timeout for big lists #64

Closed geekygirldawn closed 8 years ago

geekygirldawn commented 8 years ago

While trying to download the Linux Kernel mailing list, I'm getting a MySQL timeout error (see below). It timed out after about 12-16 hours, I think. I didn't note exactly when I started it, but it was sometime in the early morning and I noticed the timeout before I went to bed.

This happened before, and I increased the wait_timeout on the mysql server thinking that would solve the issue, but it didn't. It still failed before the wait_timeout was reached. I was tired last night and forgot to check to see if the db connection was still live. I checked this morning and it hadn't timed out in the screen where I left it running from the work I was doing yesterday afternoon. Usually I find it timed out in the morning.

Details:

Command:

 mlstats --db-user ** --db-password ** --db-name mlstats_linux --report-file=/home/dawn/log_files/lkml.txt "http://dir.gmane.org/gmane.linux.kernel"
Traceback (most recent call last):
  File "/usr/local/bin/mlstats", line 4, in <module>
    __import__('pkg_resources').run_script('mlstats==0.4', 'mlstats')
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 724, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1657, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/mlstats-0.4-py2.7.egg/EGG-INFO/scripts/mlstats", line 38, in <module>

  File "build/bdist.linux-x86_64/egg/pymlstats/__init__.py", line 190, in start
  File "build/bdist.linux-x86_64/egg/pymlstats/main.py", line 102, in __init__
  File "build/bdist.linux-x86_64/egg/pymlstats/main.py", line 196, in __analyze_mailing_list
  File "build/bdist.linux-x86_64/egg/pymlstats/main.py", line 235, in __set_archives_to_analyze
  File "build/bdist.linux-x86_64/egg/pymlstats/db/session.py", line 217, in set_visited_url
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1709, in merge
    load=load, _recursive=_recursive)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1752, in _merge
    merged = self.query(mapper.class_).get(key[1])
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 831, in get
    return self._get_impl(ident, loading.load_on_ident)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 864, in _get_impl
    return fallback_fn(self, key)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/loading.py", line 219, in load_on_ident
    return q.one()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2693, in one
    ret = list(self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2736, in __iter__
    return self._execute_and_instances(context)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2751, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away') [SQL: u'SELECT compressed_files.url AS compressed_files_url, compressed_files.mailing_list_url AS compressed_files_mailing_list_url, compressed_files.status AS compressed_files_status, compressed_files.last_analysis AS compressed_files_last_analysis \nFROM compressed_files \nWHERE compressed_files.url = %s'] [parameters: ('http://download.gmane.org/gmane.linux.kernel/7380',)]
sduenas commented 8 years ago

In my experience, MySQL server has gone away error is caused by two things:

Try to change the value of max_allowed_packet to something bigger. I think that by default is set to 4 or 8 MB. Put something like 128MB to be sure.

geekygirldawn commented 8 years ago

Ah, good point, thanks!

It was set to 16MB. I upped it to 128MB. I kicked it off again, so I should know by tomorrow morning whether it's working or not :)

geekygirldawn commented 8 years ago

OK, this seemed to work to solve the problem in this issue. And then I ran into another one of those strftime date not valid errors on the third file. :(