impossible to use a local worker

codalab / codabench

Codabench is a flexible, easy-to-use and reproducible benchmarking platform. Check our paper at Patterns Cell Press https://hubs.li/Q01fwRWB0

Apache License 2.0

76 stars 28 forks source link

impossible to use a local worker #1559

Closed dagecc-challenge closed 3 months ago

dagecc-challenge commented 3 months ago

Due to the public queue congestion, we have issues with some evaluations during the final phase of our competition (ending tonight!), and we tried to restart a local worker that had started successfully a couple of months ago. Unfortunately, when our local worker starts ingesting submissions, we get an error :

WS: b'exec /opt/conda/bin/python: operation not permitted\n'

We tried installing another worker from scratch and we got the exact same thing. Do you have any idea about what could go wrong here?

Thanks

Didayolo commented 3 months ago

Hi @dagecc-challenge,

We recently updated the compute worker docker image, maybe the problem comes from this. To setup, you used this command, right?

docker run \
    -v /codabench:/codabench \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -d \
    --env-file .env \
    --name compute_worker \
    --restart unless-stopped \
    --log-opt max-size=50m \
    --log-opt max-file=3 \
    codalab/competitions-v2-compute-worker:latest

Please try using this one (the docker tag is different):

docker run \
    -v /codabench:/codabench \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -d \
    --env-file .env \
    --name compute_worker \
    --restart unless-stopped \
    --log-opt max-size=50m \
    --log-opt max-file=3 \
    codalab/competitions-v2-compute-worker:cpu1.0

dagecc-challenge commented 3 months ago

Thank you for your answer.

Unfortunately, using the alternative Docker tag did not resolve the issue. However, I re-installed Docker and am now encountering a different error:

python: can't open file '/app/program/ingestion.py': [Errno 2] No such file or directory.

Do you have any suggestions on how to address this new error?

Didayolo commented 3 months ago

And when you run the same submissions on the default queue, it is working fine?

dagecc-challenge commented 3 months ago

Yes, the problem is that the submission reachs the execution time limit (1200s). That's why we need to use our queue.

dagecc-challenge commented 3 months ago

The problem has been solved. There was an error in our configuration file. Thank you very much.