Open aofarrel opened 2 years ago
I 1000% admire the hustle here but I worry you are exceeding the reasonably expected capabilities of Docker Desktop and the Cromwell local backend. As a first step, can you tell your Docker is still functional and accepting new work? When the task requested by Cromwell is frozen, does "hello world" run in a separate terminal complete correctly?
I already ruled that out -- Docker itself is still functional. I didn't run hello world specifically, but I did was able to docker run --it
and run a few commands. This is different from the behavior I see when I do not set the concurrent job limit to 1 on a local backend -- in that scenario I wouldn't be able to run any images at all, and need to forcibly quit + restart Docker to use it.
For comparison, I ran the same WDL with the same inputs in miniwdl to see if it'd also get stuck, but it did not have this issue. miniwdl was able to complete the 1000x scattered task + the final task that gathers the scattered input. So it seems that Docker itself can handle launching a thousand containers one at a time.
I set up a heavily scattered (~1000x) workflow to run overnight on my local machine (Mac OS Catalina, Intel hardware). The machine is set up to never sleep and was connected to AC power. My Cromwell config is set to only run one task at a time, ie, only one shard of a scattered task runs at a time. The workflow stopped processing on shard 885, which was the 233rd shard to start (shards appear to start in a random order, that's not an issue).
It looks like the Docker container in question is getting created, but not used. The container is not running according to Docker Desktop and the Docker CLI tools (see output below).
Workflow
I've seen this happen with a few workflows, but this time around it's this one (failure is occurring on second task): https://github.com/aofarrel/SRANWRP/blob/bioproject_stuff/workflows/is_this_tuberculosis.wdl
Ruled out
docker run -it
works in a new terminal windowcurl command failed
)Docker container logs
docker logs cf6f4828adc61eacf06337ce3caf2c110df6cc04937530a90bbfb0843acbb528
gives no output.Entering the container
docker exec -it cf6f4828adc61eacf06337ce3caf2c110df6cc04937530a90bbfb0843acbb528 /bin/sh
returnsError response from daemon: Container cf6f4828adc61eacf06337ce3caf2c110df6cc04937530a90bbfb0843acbb528 is not running
Docker inspect
Terminal output (first couple shards trimmed of course)