ceph / paddles

RESTful API to store (and report) on Ceph tests
14 stars 26 forks source link

controllers/nodes: Add retry and rollback for read requests #95

Closed amathuria closed 3 years ago

amathuria commented 3 years ago

After a read/write dependency occurs among transactions, we need to rollback and retry the requests. Pecan takes care of rolling back all write requests (PUT/POST) on error. However, this does not happen for the read requests. This fix adds a session rollback and request retry for the GET requests that fail after there is a read/write dependency while unlocking nodes.

The test file adds a test that replicates the read/write dependency by unlocking nodes (similar to test_nodes_race.py)

Signed-off-by: Aishwarya Mathuria amathuri@redhat.com

kshtsk commented 3 years ago
paddles/tests/controllers/test_read_write_dependency.py:6:1: F401 'datetime.datetime' imported but unused
paddles/tests/controllers/test_read_write_dependency.py:67:13: F841 local variable 'response' is assigned to but never used
djgalloway commented 3 years ago

This maybe seems new after deploying this. Related?

2021-08-03 19:45:18,808 INFO  [paddles.controllers.jobs] Job nojha-2021-08-03_18:59:59-rados-wip-yuri-testing-2021-07-27-0830-pacific-distro-basic-smithi/6309435 status changed from waiting to running
[2021-08-03 19:45:18 +0000] [10059] [ERROR] Error handling request
Traceback (most recent call last):
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 130, in handle
    self.handle_request(listener, req, client, addr)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 171, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/middleware/recursive.py", line 56, in __call__
    return self.application(environ, start_response)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 810, in __call__
    return super(Pecan, self).__call__(environ, start_response)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 659, in __call__
    self.invoke_controller(controller, args, kwargs, state)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 559, in invoke_controller
    result = controller(*args, **kwargs)
  File "/home/ubuntu/paddles/paddles/controllers/jobs.py", line 53, in index_post
    self.job.update(request.json)
  File "/home/ubuntu/paddles/paddles/models/jobs.py", line 206, in update
    self.set_or_update(json_data)
  File "/home/ubuntu/paddles/paddles/models/jobs.py", line 150, in set_or_update
    self.run.set_status()
  File "/home/ubuntu/paddles/paddles/models/runs.py", line 243, in set_status
    results = results or self.get_results()
  File "/home/ubuntu/paddles/paddles/models/runs.py", line 215, in get_results
    jobs_status = [value[0] for value in self.jobs.values(Job.status)]
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 1024, in values
    return iter(q)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2515, in __iter__
    self.session._autoflush()
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1292, in _autoflush
    util.raise_from_cause(e)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1282, in _autoflush
    self.flush()
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2004, in flush
    self._flush(objects)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2122, in _flush
    transaction.rollback(_capture_exception=True)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2086, in _flush
    flush_context.execute()
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
    rec.execute(self)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
    uow
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj
    mapper, table, update)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 692, in _emit_update_statements
    execute(statement, multiparams)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.extensions.TransactionRollbackError) could not serialize access due to concurrent update
 [SQL: 'UPDATE runs SET updated=%(updated)s WHERE runs.id = %(runs_id)s'] [parameters: {'runs_id': 105383, 'updated': datetime.datetime(2021, 8, 3, 19, 45, 18, 805169)}]
amathuria commented 3 years ago

This maybe seems new after deploying this. Related?

2021-08-03 19:45:18,808 INFO  [paddles.controllers.jobs] Job nojha-2021-08-03_18:59:59-rados-wip-yuri-testing-2021-07-27-0830-pacific-distro-basic-smithi/6309435 status changed from waiting to running
[2021-08-03 19:45:18 +0000] [10059] [ERROR] Error handling request
Traceback (most recent call last):
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 130, in handle
    self.handle_request(listener, req, client, addr)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 171, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/middleware/recursive.py", line 56, in __call__
    return self.application(environ, start_response)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 810, in __call__
    return super(Pecan, self).__call__(environ, start_response)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 659, in __call__
    self.invoke_controller(controller, args, kwargs, state)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/pecan/core.py", line 559, in invoke_controller
    result = controller(*args, **kwargs)
  File "/home/ubuntu/paddles/paddles/controllers/jobs.py", line 53, in index_post
    self.job.update(request.json)
  File "/home/ubuntu/paddles/paddles/models/jobs.py", line 206, in update
    self.set_or_update(json_data)
  File "/home/ubuntu/paddles/paddles/models/jobs.py", line 150, in set_or_update
    self.run.set_status()
  File "/home/ubuntu/paddles/paddles/models/runs.py", line 243, in set_status
    results = results or self.get_results()
  File "/home/ubuntu/paddles/paddles/models/runs.py", line 215, in get_results
    jobs_status = [value[0] for value in self.jobs.values(Job.status)]
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 1024, in values
    return iter(q)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2515, in __iter__
    self.session._autoflush()
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1292, in _autoflush
    util.raise_from_cause(e)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1282, in _autoflush
    self.flush()
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2004, in flush
    self._flush(objects)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2122, in _flush
    transaction.rollback(_capture_exception=True)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2086, in _flush
    flush_context.execute()
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
    rec.execute(self)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
    uow
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj
    mapper, table, update)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 692, in _emit_update_statements
    execute(statement, multiparams)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/home/ubuntu/.virtualenvs/paddles/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.extensions.TransactionRollbackError) could not serialize access due to concurrent update
 [SQL: 'UPDATE runs SET updated=%(updated)s WHERE runs.id = %(runs_id)s'] [parameters: {'runs_id': 105383, 'updated': datetime.datetime(2021, 8, 3, 19, 45, 18, 805169)}]

Doesn't look related to the change from the logs but I'll check it out and let you know. Is the error occurring every time a job runs?

djgalloway commented 3 years ago

No, not every time. Not sure what the trigger is.

amathuria commented 3 years ago

No, not every time. Not sure what the trigger is. Yeah I am pretty sure this fix doesn't cause the error you have mentioned. We can track it separately?

amathuria commented 3 years ago

No, not every time. Not sure what the trigger is.

Looks similar to the issue being tracked here: https://tracker.ceph.com/issues/52101.