codalab / codabench

Codabench is a flexible, easy-to-use and reproducible benchmarking platform. Check our paper at Patterns Cell Press https://hubs.li/Q01fwRWB0
Apache License 2.0
76 stars 28 forks source link

"Failure updating submission data" and Queue Congestion #1657

Closed archettialberto closed 1 week ago

archettialberto commented 1 week ago

Dear Codabench Team,

We are running the competition https://www.codabench.org/competitions/4430. There seems to be an issue with submissions getting stuck on an Internal Server Error 500. All workers on our servers return errors like the following. We tried

How should we handle this issue? This looks very similar to #1471 and #1446. Thanks for the support!

an2dl-worker0  | [2024-11-11 14:28:52,863: INFO/ForkPoolWorker-1] Updating submission @ https://www.codabench.org/api/submissions/138752/ with data = {'status': 'Preparing', 'status_details': None, 'secret': '...'}
an2dl-worker0  | [2024-11-11 14:28:53,150: INFO/ForkPoolWorker-1] Submission patch failed with status = 500, and response =
an2dl-worker0  | b'<h1>Server Error (500)</h1>'
an2dl-worker0  | [2024-11-11 14:28:53,150: INFO/ForkPoolWorker-1] Updating submission @ https://www.codabench.org/api/submissions/138752/ with data = {'status': 'Failed', 'status_details': 'Failure updating submission data.', 'secret': '...'}
an2dl-worker0  | [2024-11-11 14:28:53,216: INFO/ForkPoolWorker-1] Submission patch failed with status = 500, and response =
an2dl-worker0  | b'<h1>Server Error (500)</h1>'
an2dl-worker0  | [2024-11-11 14:28:53,217: INFO/ForkPoolWorker-1] Destroying submission temp dir: /codabench/tmpih0msgw6
an2dl-worker0  | [2024-11-11 14:28:53,220: ERROR/ForkPoolWorker-1] Task compute_worker_run[3a3bfe66-e7db-484d-8ee3-d5bde8162a4c] raised unexpected: SubmissionException('Failure updating submission data.')
an2dl-worker0  | Traceback (most recent call last):
an2dl-worker0  |   File "/compute_worker.py", line 112, in run_wrapper
an2dl-worker0  |     run.prepare()
an2dl-worker0  |   File "/compute_worker.py", line 765, in prepare
an2dl-worker0  |     self._update_status(STATUS_PREPARING)
an2dl-worker0  |   File "/compute_worker.py", line 356, in _update_status
an2dl-worker0  |     self._update_submission(data)
an2dl-worker0  |   File "/compute_worker.py", line 339, in _update_submission
an2dl-worker0  |     raise SubmissionException("Failure updating submission data.")
an2dl-worker0  | compute_worker.SubmissionException: Failure updating submission data.
an2dl-worker0  |
an2dl-worker0  | During handling of the above exception, another exception occurred:
an2dl-worker0  |
an2dl-worker0  | Traceback (most recent call last):
an2dl-worker0  |   File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 385, in trace_task
an2dl-worker0  |     R = retval = fun(*args, **kwargs)
an2dl-worker0  |   File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 650, in __protected_call__
an2dl-worker0  |     return self.run(*args, **kwargs)
an2dl-worker0  |   File "/compute_worker.py", line 120, in run_wrapper
an2dl-worker0  |     run._update_status(STATUS_FAILED, str(e))
an2dl-worker0  |   File "/compute_worker.py", line 356, in _update_status
an2dl-worker0  |     self._update_submission(data)
an2dl-worker0  |   File "/compute_worker.py", line 339, in _update_submission
an2dl-worker0  |     raise SubmissionException("Failure updating submission data.")
an2dl-worker0  | compute_worker.SubmissionException: Failure updating submission data.
ndido98 commented 1 week ago

Apparently we're having the same problem in the competition I'm running and we're using the public default queue, so I guess the problem lies upstream somehow.

ObadaS commented 1 week ago

Hello, the problem is fixed.