We are running the competition https://www.codabench.org/competitions/4430. There seems to be an issue with submissions getting stuck on an Internal Server Error 500. All workers on our servers return errors like the following. We tried
Restarting the workers
Changing the queue
How should we handle this issue? This looks very similar to #1471 and #1446.
Thanks for the support!
an2dl-worker0 | [2024-11-11 14:28:52,863: INFO/ForkPoolWorker-1] Updating submission @ https://www.codabench.org/api/submissions/138752/ with data = {'status': 'Preparing', 'status_details': None, 'secret': '...'}
an2dl-worker0 | [2024-11-11 14:28:53,150: INFO/ForkPoolWorker-1] Submission patch failed with status = 500, and response =
an2dl-worker0 | b'<h1>Server Error (500)</h1>'
an2dl-worker0 | [2024-11-11 14:28:53,150: INFO/ForkPoolWorker-1] Updating submission @ https://www.codabench.org/api/submissions/138752/ with data = {'status': 'Failed', 'status_details': 'Failure updating submission data.', 'secret': '...'}
an2dl-worker0 | [2024-11-11 14:28:53,216: INFO/ForkPoolWorker-1] Submission patch failed with status = 500, and response =
an2dl-worker0 | b'<h1>Server Error (500)</h1>'
an2dl-worker0 | [2024-11-11 14:28:53,217: INFO/ForkPoolWorker-1] Destroying submission temp dir: /codabench/tmpih0msgw6
an2dl-worker0 | [2024-11-11 14:28:53,220: ERROR/ForkPoolWorker-1] Task compute_worker_run[3a3bfe66-e7db-484d-8ee3-d5bde8162a4c] raised unexpected: SubmissionException('Failure updating submission data.')
an2dl-worker0 | Traceback (most recent call last):
an2dl-worker0 | File "/compute_worker.py", line 112, in run_wrapper
an2dl-worker0 | run.prepare()
an2dl-worker0 | File "/compute_worker.py", line 765, in prepare
an2dl-worker0 | self._update_status(STATUS_PREPARING)
an2dl-worker0 | File "/compute_worker.py", line 356, in _update_status
an2dl-worker0 | self._update_submission(data)
an2dl-worker0 | File "/compute_worker.py", line 339, in _update_submission
an2dl-worker0 | raise SubmissionException("Failure updating submission data.")
an2dl-worker0 | compute_worker.SubmissionException: Failure updating submission data.
an2dl-worker0 |
an2dl-worker0 | During handling of the above exception, another exception occurred:
an2dl-worker0 |
an2dl-worker0 | Traceback (most recent call last):
an2dl-worker0 | File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 385, in trace_task
an2dl-worker0 | R = retval = fun(*args, **kwargs)
an2dl-worker0 | File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 650, in __protected_call__
an2dl-worker0 | return self.run(*args, **kwargs)
an2dl-worker0 | File "/compute_worker.py", line 120, in run_wrapper
an2dl-worker0 | run._update_status(STATUS_FAILED, str(e))
an2dl-worker0 | File "/compute_worker.py", line 356, in _update_status
an2dl-worker0 | self._update_submission(data)
an2dl-worker0 | File "/compute_worker.py", line 339, in _update_submission
an2dl-worker0 | raise SubmissionException("Failure updating submission data.")
an2dl-worker0 | compute_worker.SubmissionException: Failure updating submission data.
Apparently we're having the same problem in the competition I'm running and we're using the public default queue, so I guess the problem lies upstream somehow.
Dear Codabench Team,
We are running the competition https://www.codabench.org/competitions/4430. There seems to be an issue with submissions getting stuck on an Internal Server Error 500. All workers on our servers return errors like the following. We tried
How should we handle this issue? This looks very similar to #1471 and #1446. Thanks for the support!