codalab / codabench

Codabench is a flexible, easy-to-use and reproducible benchmarking platform. Check our paper at Patterns Cell Press https://hubs.li/Q01fwRWB0
Apache License 2.0
65 stars 27 forks source link

"Failure updating submission data" on both public and private queue #1471

Closed johanneskruse closed 3 months ago

johanneskruse commented 3 months ago

Hi,

I'm running a competition.

I started encountering the following error on my remote workers:

/usr/local/lib/python3.8/site-packages/celery/platforms.py:800: RuntimeWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

  warnings.warn(RuntimeWarning(ROOT_DISCOURAGED.format(

 -------------- compute-worker@63e894b26d8f v4.4.0 (cliffs)
--- ***** ----- 
-- ******* ---- Linux-5.15.0-1062-aws-x86_64-with-glibc2.34 2024-06-07 05:39:21
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         __main__:0x7f2131447a30
- ** ---------- .> transport:   amqp://63a35e45-cb28-4eed-9c2c-af8072bf9d9c:**@www.codabench.org:5672/572d5689-cb2d-4da9-a09c-cb9a0b0284ef
- ** ---------- .> results:     disabled://
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> compute-worker   exchange=compute-worker(direct) key=compute-worker

[tasks]
  . compute_worker_run

[2024-06-07 05:39:21,941: INFO/MainProcess] Connected to amqp://63a35e45-cb28-4eed-9c2c-af8072bf9d9c:**@www.codabench.org:5672/572d5689-cb2d-4da9-a09c-cb9a0b0284ef
[2024-06-07 05:39:22,114: INFO/MainProcess] mingle: searching for neighbors
[2024-06-07 05:39:23,500: INFO/MainProcess] mingle: all alone
[2024-06-07 05:39:23,859: INFO/MainProcess] compute-worker@63e894b26d8f ready.
[2024-06-07 05:39:23,860: INFO/MainProcess] Received task: compute_worker_run[1acefb0a-613c-407e-97f1-3cf154e246ae]  
[2024-06-07 05:39:23,983: INFO/ForkPoolWorker-1] Received run arguments: {'user_pk': 6655, 'submissions_api_url': 'https://www.codabench.org/api', 'secret': 'c5a0eb20-fff9-44e7-b564-249169646bb4', 'docker_image': 'codalab/codalab-legacy:py39', 'execution_time_limit': 172800, 'id': 68433, 'is_scoring': False, 'prediction_result': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/prediction_result/2024-06-07-1717738652/a3392ac9edd0/prediction_result.zip?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=vF9HvMIWWGTJm3pjWiJ9L7fZ210%3D&content-type=application%2Fzip&Expires=1717825053', 'input_data': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/dataset/2024-04-04-1712207092/b24e0df11261/input_data.zip?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=XKd7EXpMFP6peWuiQrqohb7b7YE%3D&Expires=1717825053', 'ingestion_only_during_scoring': False, 'program_data': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/dataset/2024-06-06-1717708585/bd4f130fdb04/random_ranking.zip?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=zjUjb3MaPYOtogB39Wbm34wLc3U%3D&Expires=1717825053', 'prediction_stdout': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/submission_details/2024-06-07-1717738653/b1943afa9ea7/prediction_stdout.txt?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=OcT60osyU%2FCvCm928PVqGvdO6%2B8%3D&content-type=application%2Fzip&Expires=1717825053', 'prediction_stderr': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/submission_details/2024-06-07-1717738653/8b8b1af78be3/prediction_stderr.txt?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=u82ROiOKQ4QMEn4pc7BE%2BGr9bro%3D&content-type=application%2Fzip&Expires=1717825053', 'prediction_ingestion_stdout': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/submission_details/2024-06-07-1717738653/9624471121ec/prediction_ingestion_stdout.txt?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=V6E54npsbPNMXuI6Q46heMfWVv0%3D&content-type=application%2Fzip&Expires=1717825053', 'prediction_ingestion_stderr': 'https://miniodis-rproxy.lisn.upsaclay.fr/coda-v2-prod-private/submission_details/2024-06-07-1717738653/21a770e56477/prediction_ingestion_stderr.txt?AWSAccessKeyId=EASNOMJFX9QFW4QIY4SL&Signature=O6923QJ%2FqINgdeGyL503Fos3kUA%3D&content-type=application%2Fzip&Expires=1717825053'}
[2024-06-07 05:39:23,984: INFO/ForkPoolWorker-1] Updating submission @ https://www.codabench.org/api/submissions/68433/ with data = {'status': 'Preparing', 'status_details': None, 'secret': 'c5a0eb20-fff9-44e7-b564-249169646bb4'}
[2024-06-07 05:39:24,325: INFO/ForkPoolWorker-1] Submission patch failed with status = 500, and response = 
b'<h1>Server Error (500)</h1>'
[2024-06-07 05:39:24,326: INFO/ForkPoolWorker-1] Updating submission @ https://www.codabench.org/api/submissions/68433/ with data = {'status': 'Failed', 'status_details': 'Failure updating submission data.', 'secret': 'c5a0eb20-fff9-44e7-b564-249169646bb4'}
[2024-06-07 05:39:24,430: INFO/ForkPoolWorker-1] Submission patch failed with status = 500, and response = 
b'<h1>Server Error (500)</h1>'
[2024-06-07 05:39:24,431: INFO/ForkPoolWorker-1] Destroying submission temp dir: /codabench/tmprkx556ah
[2024-06-07 05:39:24,432: ERROR/ForkPoolWorker-1] Task compute_worker_run[1acefb0a-613c-407e-97f1-3cf154e246ae] raised unexpected: SubmissionException('Failure updating submission data.')
Traceback (most recent call last):
  File "/compute_worker.py", line 115, in run_wrapper
    run.prepare()
  File "/compute_worker.py", line 764, in prepare
    self._update_status(STATUS_PREPARING)
  File "/compute_worker.py", line 359, in _update_status
    self._update_submission(data)
  File "/compute_worker.py", line 342, in _update_submission
    raise SubmissionException("Failure updating submission data.")
compute_worker.SubmissionException: Failure updating submission data.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/celery/app/trace.py", line 650, in __protected_call__
    return self.run(*args, **kwargs)
  File "/compute_worker.py", line 123, in run_wrapper
    run._update_status(STATUS_FAILED, str(e))
  File "/compute_worker.py", line 359, in _update_status
    self._update_submission(data)
  File "/compute_worker.py", line 342, in _update_submission
    raise SubmissionException("Failure updating submission data.")
compute_worker.SubmissionException: Failure updating submission data.

Submission goes from Submitting to Submitted.

I tried:

None of the above seem to be working. Are the an explanation or solution to the problem?

Best, Johannes

johanneskruse commented 3 months ago

This error was also seen in #1446

ObadaS commented 3 months ago

@johanneskruse The problem should be fixed for now

johanneskruse commented 3 months ago

Thank you for the quick action. It seems to be working again!

Didayolo commented 3 months ago

I close this issue then. We still need to find a long-term solution to this problem, but we'll keep track of it in #1446.