calpoly-csai / api

Official API for the NIMBUS Voice Assistant accessible via HTTP REST protocol.
https://nimbus.api.calpolycsai.com/
GNU General Public License v3.0
9 stars 4 forks source link

MySql sessions dying on GCP #179

Open snekiam opened 4 years ago

snekiam commented 4 years ago

Describe the bug This is quite the curious bug - its not clear to me why this occurs, but once it does the worker doesn't respond anymore. Perhaps the OsError is related to the transaction being trashed. I'm mostly pointing this issue at the sqlalchemy.exc.InvalidRequestError:, but I'm guessing some of the other log content is relevant.

To Reproduce Unfortunately...I can only tell you to run this on GCP (or in docker somewhere else, most likely), leave it running for an extended period of time, and ask it questions periodically.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Here's the (possibly) relevant log content:

[2020-07-14 08:14:18 +0000] [7] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/workers/sync.py", line 133, in handle
    req = next(parser)
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/message.py", line 186, in __init__
    super().__init__(cfg, unreader)
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/message.py", line 53, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/message.py", line 198, in parse
    self.get_data(unreader, buf, stop=True)
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/message.py", line 189, in get_data
    data = unreader.read()
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/unreader.py", line 37, in read
    d = self.chunk()
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/unreader.py", line 64, in chunk
    return self.sock.recv(self.mxchunk)
  File "/usr/lib/python3.8/ssl.py", line 1226, in recv
    return self.read(buflen)
  File "/usr/lib/python3.8/ssl.py", line 1101, in read
    return self._sslobj.read(len)
OSError: [Errno 0] Error
[2020-07-14 15:51:00 +0000] [7] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/workers/sync.py", line 133, in handle
    req = next(parser)
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/message.py", line 186, in __init__
    super().__init__(cfg, unreader)
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/message.py", line 53, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/message.py", line 198, in parse
    self.get_data(unreader, buf, stop=True)
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/message.py", line 189, in get_data
    data = unreader.read()
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/unreader.py", line 37, in read
    d = self.chunk()
  File "/usr/local/lib/python3.8/dist-packages/gunicorn/http/unreader.py", line 64, in chunk
    return self.sock.recv(self.mxchunk)
  File "/usr/lib/python3.8/ssl.py", line 1226, in recv
    return self.read(buflen)
  File "/usr/lib/python3.8/ssl.py", line 1101, in read
    return self._sslobj.read(len)
OSError: [Errno 0] Error
Exception during reset or similar
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/network.py", line 249, in recv_plain
    chunk = self.sock.recv(4 - packet_len)
  File "/usr/lib/python3.8/ssl.py", line 1226, in recv
    return self.read(buflen)
  File "/usr/lib/python3.8/ssl.py", line 1101, in read
    return self._sslobj.read(len)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 693, in _finalize_fairy
    fairy._reset(pool)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 880, in _reset
    pool._dialect.do_rollback(self)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/dialects/mysql/base.py", line 2302, in do_rollback
    dbapi_connection.rollback()
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection.py", line 958, in rollback
    self._execute_query("ROLLBACK")
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection.py", line 970, in _execute_query
    self.cmd_query(query)
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection.py", line 590, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection.py", line 350, in _send_cmd
    return self._socket.recv()
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/network.py", line 277, in recv_plain
    raise errors.OperationalError(
mysql.connector.errors.OperationalError: 2055: Lost connection to MySQL server at '35.247.107.51:3306', system error: 104 Connection reset by peer

[2020-07-15 00:48:51,841] ERROR in app: Exception on /ask [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.8/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/nimbus/flask_api.py", line 147, in handle_question
    response = {"answer": nimbus.answer_question(question)}
  File "/nimbus/nimbus.py", line 42, in answer_question
    answer = qa.answer(ans_dict)
  File "/nimbus/QA.py", line 52, in answer
    db_data = self.db_query(extracted_vars, self.db)
  File "/nimbus/QA.py", line 231, in _chain_db_access
    for key, val in fn(extracted_vars, db).items():
  File "/nimbus/QA.py", line 95, in _get_property
    value = db.get_property_from_entity(
  File "/nimbus/database_wrapper.py", line 533, in get_property_from_entity
    props = self._get_property_from_entity(
  File "/nimbus/database_wrapper.py", line 505, in _get_property_from_entity
    for row in query_obj.all():
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py", line 3233, in all
    return list(self)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py", line 3388, in __iter__
    self.session._autoflush()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1586, in _autoflush
    self.flush()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 2479, in flush
    self._flush(objects)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 2571, in _flush
    flush_context.transaction = transaction = self.begin(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 947, in begin
    self.transaction = self.transaction._begin(nested=nested)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 316, in _begin
    self._assert_active()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 288, in _assert_active
    raise sa_exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (mysql.connector.errors.OperationalError) MySQL Connection not available.

[2020-07-15 00:57:51,144] ERROR in app: Exception on /ask [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.8/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/nimbus/flask_api.py", line 147, in handle_question
    response = {"answer": nimbus.answer_question(question)}
  File "/nimbus/nimbus.py", line 42, in answer_question
    answer = qa.answer(ans_dict)
  File "/nimbus/QA.py", line 52, in answer
    db_data = self.db_query(extracted_vars, self.db)
  File "/nimbus/QA.py", line 231, in _chain_db_access
    for key, val in fn(extracted_vars, db).items():
  File "/nimbus/QA.py", line 95, in _get_property
    value = db.get_property_from_entity(
  File "/nimbus/database_wrapper.py", line 533, in get_property_from_entity
    props = self._get_property_from_entity(
  File "/nimbus/database_wrapper.py", line 505, in _get_property_from_entity
    for row in query_obj.all():
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py", line 3233, in all
    return list(self)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py", line 3388, in __iter__
    self.session._autoflush()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1586, in _autoflush
    self.flush()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 2479, in flush
    self._flush(objects)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 2571, in _flush
    flush_context.transaction = transaction = self.begin(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 947, in begin
    self.transaction = self.transaction._begin(nested=nested)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 316, in _begin
    self._assert_active()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 288, in _assert_active
    raise sa_exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (mysql.connector.errors.OperationalError) MySQL Connection not available.
[SQL: INSERT INTO `QuestionLog` (question, timestamp) VALUES (%(question)s, %(timestamp)s)]
[parameters: [{'question': 'what is?', 'timestamp': datetime.datetime(2020, 7, 15, 0, 44, 11, 221947)}]]
(Background on this error at: http://sqlalche.me/e/e3q8) (Background on this error at: http://sqlalche.me/e/7s2a)
chidiewenike commented 3 years ago

Interesting addressing of the problem here: https://github.com/benoitc/gunicorn/issues/1885

snekiam commented 3 years ago

Very interesting - I wonder if GCP is doing a 'health ping' which is breaking things. I'll check that this Sunday.

mfekadu commented 3 years ago

@snekiam I am unsure if GCP does health checks by default. I believe it would need to be set up by an administrator && I don't believe we have done that yet.

snekiam commented 3 years ago

195 should help this not kill the api completely, but doesn't solve the root issue.