OasisLMF / OasisPlatform

Loss modelling platform.
BSD 3-Clause "New" or "Revised" License
42 stars 16 forks source link

Multile workflows running on single analysis id ~ bug #1057

Open sambles opened 1 month ago

sambles commented 1 month ago

Issue Description

Somehow two separate Input generation requests running on the same analysis.

14455 [2024-05-27 15:26:49,173: INFO/ForkPoolWorker-7] set_task_status_v2[d30c4a2a-9371-4574-a49c-f1c228fd3afd]: Task Status Update: analysis_pk: 2035, status: INPUTS_GENERATION_STARTED, time: 2024-05-27 15:26:49.149990+00:00                                                           
14456 [2024-05-27 15:26:49,180: INFO/ForkPoolWorker-1] set_task_status_v2[cd8d7837-9520-4d91-b2ca-15b24b993398]: Task Status Update: analysis_pk: 2035, status: INPUTS_GENERATION_STARTED, time: 2024-05-27 15:26:49.159613+00:00

One with celery task id d30c4a2a-9371-4574-a49c-f1c228fd3afd and another with cd8d7837-9520-4d91-b2ca-15b24b993398 Not clear on how this happened.

Logs

SQL error

TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
  File "/home/server/.local/lib/python3.10/site-packages/pymysql/cursors.py", line 153, in execute
    result = self._query(query)
  File "/home/server/.local/lib/python3.10/site-packages/pymysql/cursors.py", line 322, in _query
    conn.query(q)
  File "/home/server/.local/lib/python3.10/site-packages/pymysql/connections.py", line 562, in query
    self._execute_command(COMMAND.COM_QUERY, sql)
  File "/home/server/.local/lib/python3.10/site-packages/pymysql/connections.py", line 864, in _execute_command
    self._write_bytes(packet)
  File "/home/server/.local/lib/python3.10/site-packages/pymysql/connections.py", line 809, in _write_bytes
    raise err.OperationalError(
pymysql.err.OperationalError: (2006, "MySQL server has gone away (TimeoutError(110, 'Connection timed out'))")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/server/.local/lib/python3.10/site-packages/celery/backends/database/__init__.py", line 47, in _inner
    return fun(*args, **kwargs)
  File "/home/server/.local/lib/python3.10/site-packages/celery/backends/database/__init__.py", line 117, in _store_result
    task = list(session.query(self.task_cls).filter(self.task_cls.task_id == task_id))
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2901, in __iter__
    result = self._iter()
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2916, in _iter
    result = self.session.execute(
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1714, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
    ret = self._execute_context(
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
    self._handle_dbapi_exception(
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "/home/server/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
  File "/home/server/.local/lib/python3.10/site-packages/pymysql/cursors.py", line 153, in execute
    result = self._query(query)
  File "/home/server/.local/lib/python3.10/site-packages/pymysql/cursors.py", line 322, in _query
    conn.query(q)
  File "/home/server/.local/lib/python3.10/site-packages/pymysql/connections.py", line 562, in query
    self._execute_command(COMMAND.COM_QUERY, sql)
  File "/home/server/.local/lib/python3.10/site-packages/pymysql/connections.py", line 864, in _execute_command
    self._write_bytes(packet)
  File "/home/server/.local/lib/python3.10/site-packages/pymysql/connections.py", line 809, in _write_bytes
    raise err.OperationalError(
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (TimeoutError(110, 'Connection timed out'))")
[SQL: SELECT celery_taskmeta.id AS celery_taskmeta_id, celery_taskmeta.task_id AS celery_taskmeta_task_id, celery_taskmeta.status AS celery_taskmeta_status, celery_taskmeta.result AS celery_taskmeta_result, celery

Worker monitor

[2024-05-27 17:07:45,868: INFO/ForkPoolWorker-8] record_input_files[133ae36f-e00a-4296-b4dc-d45184b61736]: record_input_files: analysis_id: 2035, initiator_id: 1
[2024-05-27 17:07:45,868: INFO/ForkPoolWorker-8] record_input_files[133ae36f-e00a-4296-b4dc-d45184b61736]: results: {'lookup_error_location': 'oasis/worker/d7f1545bfe4a4e44a4a7c8c2b03790b6.csv', 'lookup_success_location': 'oasis/worker/0626a8d56bad4ee299fb36827c56a8d0.csv', 'lookup_validation_location': 'oasis/worker/14d40bf7b7da4a359c3ae2ba25542c6c.json', 'summary_levels_location': 'oasis/worker/68fa2de09e264ef895a1129e93
[2024-05-27 17:07:45,868: INFO/ForkPoolWorker-8] record_input_files[133ae36f-e00a-4296-b4dc-d45184b61736]: args: {'output_location': 'oasis/worker/80c59cc3c2a941ce80bfe3843232ee2d.tar.gz', 'lookup_error_location': 'oasis/worker/d7f1545bfe4a4e44a4a7c8c2b03790b6.csv', 'lookup_success_location': 'oasis/worker/0626a8d56bad4ee299fb36827c56a8d0.csv', 'lookup_validation_location': 'oasis/worker/14d40bf7b7da4a359c3ae2ba25542c6c.js
[2024-05-27 17:07:51,173: INFO/ForkPoolWorker-8] Task record_input_files[133ae36f-e00a-4296-b4dc-d45184b61736] succeeded in 5.306860452052206s: {'lookup_error_location': 'oasis/worker/d7f1545bfe4a4e44a4a7c8c2b03790b6.csv', 'lookup_success_location': 'oasis/worker/0626a8d56bad4ee299fb36827c56a8d0.csv', 'lookup_validation_location': 'oasis/worker/14d40bf7b7da4a359c3ae2ba25542c6c.json', 'summary_levels_location': 'oasis/worke
[2024-05-27 17:07:51,242: INFO/ForkPoolWorker-8] cancel_subtasks[0c410967-2049-4911-a564-6e76ae3a5589]: subtask revoked: analysis_id=2035, task_id=133ae36f-e00a-4296-b4dc-d45184b61736, status=STARTED
[2024-05-27 17:07:51,253: INFO/ForkPoolWorker-8] cancel_subtasks[0c410967-2049-4911-a564-6e76ae3a5589]: subtask revoked: analysis_id=2035, task_id=ce2c0427-fcf2-4ff2-9f5e-de4017e0dfa7, status=QUEUED
[2024-05-27 17:07:51,272: INFO/ForkPoolWorker-8] Task cancel_subtasks[0c410967-2049-4911-a564-6e76ae3a5589] succeeded in 0.04625087697058916s: None
[2024-05-27 17:07:57,191: INFO/ForkPoolWorker-8] Task record_sub_task_start[cb0a23b6-abb5-4d25-a958-e051c17ee7b9] succeeded in 0.02722559799440205s: None
[2024-05-27 17:07:59,130: INFO/ForkPoolWorker-8] Task update_task_id[91729006-6c6b-422f-9beb-df33939e2ce7] succeeded in 0.0432868079515174s: None
[2024-05-27 17:07:59,182: INFO/ForkPoolWorker-8] Task record_sub_task_start[2cadd0f0-18c0-496d-822f-6e5148e418ed] succeeded in 0.02756127796601504s: None
[2024-05-27 17:08:12,684: INFO/ForkPoolWorker-8] record_input_files[0ecca812-b7d7-4437-b0c4-1b510d63ebe1]: record_input_files: analysis_id: 2035, initiator_id: 1
[2024-05-27 17:08:12,684: INFO/ForkPoolWorker-8] record_input_files[0ecca812-b7d7-4437-b0c4-1b510d63ebe1]: results: {'lookup_error_location': 'oasis/worker/81e78562d0ba43fdb993e0331b950020.csv', 'lookup_success_location': 'oasis/worker/59c277fd373d48c298ef936b64546e2c.csv', 'lookup_validation_location': 'oasis/worker/1d8b3b653e5c41d1b3def847bcb88f92.json', 'summary_levels_location': 'oasis/worker/31c6d7af0a06486fa8b3b668cd
[2024-05-27 17:08:12,684: INFO/ForkPoolWorker-8] record_input_files[0ecca812-b7d7-4437-b0c4-1b510d63ebe1]: args: {'output_location': 'oasis/worker/50e9dc918f3346a2a7053e4e3dc13010.tar.gz', 'lookup_error_location': 'oasis/worker/81e78562d0ba43fdb993e0331b950020.csv', 'lookup_success_location': 'oasis/worker/59c277fd373d48c298ef936b64546e2c.csv', 'lookup_validation_location': 'oasis/worker/1d8b3b653e5c41d1b3def847bcb88f92.js
[2024-05-27 17:08:14,584: INFO/ForkPoolWorker-8] Task record_input_files[0ecca812-b7d7-4437-b0c4-1b510d63ebe1] succeeded in 1.9016667669638991s: {'lookup_error_location': 'oasis/worker/81e78562d0ba43fdb993e0331b950020.csv', 'lookup_success_location': 'oasis/worker/59c277fd373d48c298ef936b64546e2c.csv', 'lookup_validation_location': 'oasis/worker/1d8b3b653e5c41d1b3def847bcb88f92.json', 'summary_levels_location': 'oasis/work
[2024-05-27 17:08:14,618: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=aa347a23-e396-486f-a918-ecb5be4fe52d, status=STARTED
[2024-05-27 17:08:14,629: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=af410450-d738-411a-97bb-e137352cac39, status=QUEUED
[2024-05-27 17:08:14,640: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=6107f027-dcca-46b3-bf0b-efd695d01770, status=QUEUED
[2024-05-27 17:08:14,654: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=34fd1353-bb0a-4243-a5e7-a2450b9de3e2, status=QUEUED
[2024-05-27 17:08:14,668: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=b2c3b220-3d96-4b18-8648-12ca07f12b24, status=QUEUED
[2024-05-27 17:08:14,680: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=77db9560-cdb3-4d75-b9d9-38b26ee64e8c, status=QUEUED
[2024-05-27 17:08:14,691: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=f30a50b0-7e3f-4b12-b3bf-a7ce73e9cd63, status=QUEUED
[2024-05-27 17:08:14,702: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=369440d5-2dc1-4c77-a828-ce4de3dd09c6, status=QUEUED
[2024-05-27 17:08:14,716: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=96055d36-3251-47db-ba45-bf16445da441, status=QUEUED
[2024-05-27 17:08:14,728: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=0429bd40-0674-4b21-b0c2-178db075f77e, status=QUEUED
[2024-05-27 17:08:14,743: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=735bf51a-198b-47e5-9595-070e75b0dba5, status=QUEUED
[2024-05-27 17:08:14,755: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=ef2f0ef0-ae12-45c0-8aaa-40ce3cc75b3b, status=QUEUED
[2024-05-27 17:08:14,765: INFO/ForkPoolWorker-8] cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68]: subtask revoked: analysis_id=2035, task_id=60f1bea4-7740-4e44-9767-9cdb5d305b40, status=QUEUED
[2024-05-27 17:08:14,793: INFO/ForkPoolWorker-8] Task cancel_subtasks[eae162de-ed7d-4858-a17e-7c27e3a1ba68] succeeded in 0.19143770402297378s: None
sambles commented 1 month ago

issue might be linked to the cancellation call?

15:23:19,299 - POST /api/v2/analyses/2035/generate_inputs/
15:23:44,066 - POST /api/v2/analyses/2035/cancel_generate_inputs/
15:24:58,574 - POST /api/v2/analyses/2035/generate_inputs/