Closed johanneskruse closed 4 weeks ago
Hi @johanneskruse,
Is your competition using the default queue?
@ObadaS Could this be due to the new setup of workers with 60 GB allocated for Docker?
@ObadaS Could this be due to the new setup of workers with 60 GB allocated for Docker?
It is possible, it could explain why it sometimes works since not all workers would have the same images stored. I did change the crontab to prune every 6 hours though, so it would be weird to have the same problem happen multiple days in a row unless there is a competition using very large images.
Hi @Didayolo,
I am running the competition using my own remote workers. I had 2-3 days were it was bad, then it got good, and not it happens all the time again.
@ObadaS - should I remove the crontab
; it is currently quite a big issue.
@johanneskruse
If you are using your own compute workers, you should try to find more logs by connecting into the machines and using the following command:
docker logs -f compute_worker
To try to understand why sometimes the docker pull
command fails. It may be connection issues, or lack of storage, etc.
should I remove the crontab; it is currently quite a big issue.
The goal of the crontab is to remove the unused docker images and avoid cluttering the disk. It should not be an issue. However, if your workers are linked to only one competition, only 1 docker image will be used so you can indeed remove the crontab.
It seemed to have run out of storage. I've deleted that worker and started a new one - it seems to be working again.
Is there a way to prevent it from running out of storage? It is good for a period, but then suddenly it's all full.
@Didayolo thanks for the quick reply.
@johanneskruse Docker images are the main objects taking space. How big is the storage that the worker has access to ?
The default storage on the worker is 45 GB. This can be increased.
I recommend you increase it to 100 GB, it should fix your storage issues.
Also, if your worker is pulling different docker images (from different competitions) it is important to include the crontab (see https://github.com/codalab/codabench/wiki/Compute-Worker-Management---Setup).
I marked this issue as solved, but feel free to come back to us if you are still experiencing issues.
I recommend you increase it to 100 GB, it should fix your storage issues.
This could be considered to mention in the Compute Worker Management Setup, as a recommendation/consideration.
Thank you for the help; it has been running smooth since.
Hi,
I am running a competition using the default Docker image (codalab/codalab-legacy).
The submissions have started to fail, giving me this error:
When resubmitting, it sometimes goes through; other times, it doesn't. Any explanation or suggestions on what I can do?
I started to notice this on May 29. I haven't had this issue before. Competition has been running since April.
Best, Johannes