DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
265 stars 44 forks source link

Job fails with docker exit status 255 #184

Closed gcheon closed 4 years ago

gcheon commented 4 years ago

Some of my jobs fail with the status message:

Execution failed: action 4: waiting for container: starting container: running ["docker" "start" "-ai" "9c5ba104cf1fe58325c9905a11de15cd2d03dc6dee636e51b48f536865a07765"]: exit status 255

When I check the logs from these jobs, it seems like the program I was running on docker was running fine and printing outputs until the job died, and the logs just stop without any error messages.

Do you have any clue what might be causing this? Also - is there any way to keep the docker container alive for debugging after the dsub job dies?

mbookman commented 4 years ago

This has most commonly been observed to be caused by the user command running out of memory. There is a bug opened for the Pipelines API team to improve the error message a bit.

Suggest to try using a VM with more memory.

gcheon commented 4 years ago

Makes sense, thank you for the fast reply!