Closed hylkedonker closed 3 years ago
Hi @hylkedonker,
That is a very interesting error, and one that I haven't ran across before. When using the local provider, it writes a timestamp to a tmp file named 'end-time.txt' in /tmp. The error seems to indicate that that either that file is empty, or was read incorrectly. There are a few things you could do to help debug:
1) Do you still get the same error if you run the hello world example again?
2) Perhaps there is a problem getting the timestamp? Run this command:
python -c 'import datetime; print(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f"))'
You should see a timestamp. If not, then we'll need to debug your system python.
3) Perhaps there is a problem writing to the file? Run this command:
cat /tmp/dsub-local/echo--hylke--201220-165144-45/task/end-time.txt
(or replace echo--hylke--201220-165144-45
with any job id).
Let me know what is output here.
Thanks for getting in touch.
python3 -c 'import datetime; print(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f"))'
gives the output 2020-12-21 12:01:42.924656
.
$ cat /tmp/dsub-local/echo--hylke--201221-115840-24/task/end-time.txt
$
4. When I uninstall the package and reinstall in the virtual environment:
```bash
python3 -m venv dsub_libs
source dsub_libs/bin/activate
pip3 install dsub
dsub \
--provider local \
--logging "${TMPDIR:-/tmp}/dsub-test/logging/" \
--output OUT="${TMPDIR:-/tmp}/dsub-test/output/out.txt" \
--command 'echo "Hello World" > "${OUT}"' \
--wait
I get a slightly different error:
***WARNING: No Docker image specified. The default, `ubuntu:14.04` will be used.
***WARNING: For reproducible pipelines, specify an image with the `--image` flag.
Job properties:
job-id: echo--hylke--201221-123051-85
job-name: echo
user-id: hylke
Launched job-id: echo--hylke--201221-123051-85
To check the status, run:
dstat --provider local --jobs 'echo--hylke--201221-123051-85' --users 'hylke' --status '*'
To cancel the job, run:
ddel --provider local --jobs 'echo--hylke--201221-123051-85' --users 'hylke'
Waiting for job to complete...
Waiting for: echo--hylke--201221-123051-85.
echo--hylke--201221-123051-85: FAILURE
['2020-12-21 12:30:54 I: Copying the logs before cleanup']
JobExecutionError: One or more jobs finished with status FAILURE or CANCELED during wait.
echo--hylke--201221-123051-85
with the stderr output:
$ cat stderr.txt
Error: No such container: dsub-echo--hylke--201221-123051-85.task
I hope this helps.
Hi @hylkedonker, do you have Docker installed? The Hello World example uses the local
provider, and installing Docker is necessary if using the local
provider.
Going back to your first error, I imagine looking at cat /tmp/dsub-local/echo--hylke--201220-165144-45/task/stderr.txt
may also reveal information about whether Docker is installed.
The error we are seeing here is certainly not helpful. In the past, I recall seeing errors about Docker not being installed. I will work on a fix about improving the error handling here.
I do have docker installed on my system.
On Mon, 21 Dec 2020, 18:37 Willy Nojopranoto, notifications@github.com wrote:
Hi @hylkedonker https://github.com/hylkedonker, do you have Docker installed? The Hello World example uses the local provider, and installing Docker is necessary https://github.com/DataBiosphere/dsub#post-installation-steps if using the local provider.
Going back to your first error, I imagine looking at cat /tmp/dsub-local/echo--hylke--201220-165144-45/task/stderr.txt may also reveal information about whether Docker is installed.
The error we are seeing here is certainly not helpful. In the past, I recall seeing errors about Docker not being installed. I will work on a fix about improving the error handling here.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataBiosphere/dsub/issues/219#issuecomment-749102867, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALRD5VBYQIEEY6UHK6OMHNTSV6BV5ANCNFSM4VDFZCMQ .
I was able to get a machine with ubuntu 20.04 running, and was able to reproduce your error regardless of whether Docker was installed.
The root issue is that the local provider makes a call to python
(not python3
) to get timestamps. On ubuntu 20.04, python3
is installed, but python
is not. This is not a problem if you are using dsub from a venv, but we'll fix it in a future release. Using a venv would be my recommended workaround, but if you're unable to resolve the 'No such container' error, you could also try creating a symlink named 'python' that points to 'python3'. This command would do that:
update-alternatives --install /usr/bin/python python /usr/bin/python3 10
I tried the command and was able to verify that python
now points to python3
. However the "No such container" error still persists (both with and without venv). Any suggestions for further debugging?
I'm unable to reproduce the "no such container" error, and doing a couple searches mostly shows that its a transient issue, which isn't too helpful here. There are a couple things we could do to debug further:
1) In the same /tmp/.../task dir, can you share the output of the log.txt
and runner-log.txt
files? There may be more detailed error messages here.
2) Can we confirm that docker is set up correctly? Does running docker run hello-world
produce any errors?
3) Are there any unexpected containers showing when you run docker ps -a
?
2020-12-23 13:19:19 I: Localizing inputs.
2020-12-23 13:19:19 I: Checking image userid.
2020-12-23 13:19:19 I: Task image: ubuntu:14.04
2020-12-23 13:19:21 I: Running Docker image.
docker: open /tmp/dsub-local/echo--hylke--201223-131919-18/task/docker.env: no such file or directory.
See 'docker run --help'.
Error response from daemon: No such container: dsub-echo--hylke--201223-131919-18.task
2020-12-23 13:19:21 I: Docker exit code .
2020-12-23 13:19:21 I: Ensure host user (1000:1000) owns Docker-written data
2020-12-23 13:19:22 I: Copying outputs.
CommandException: No URLs matched: /tmp/dsub-local/echo--hylke--201223-131919-18/task/data/output/file/tmp/dsub-test/output/out.txt
CommandException: 1 file/object could not be transferred.
2020-12-23 13:19:23 E: Error on or near line 47; /tmp/dsub-local/echo--hylke--201223-131919-18/task/runner.sh exiting with status 1
2020-12-23 13:19:23 I: Copying the logs before cleanup
As for the error docker: open /tmp/dsub-local/echo--hylke--201223-131919-18/task/docker.env: no such file or directory.
, that file does seem to exist:
$ cat /tmp/dsub-local/echo--hylke--201223-131919-18/task/docker.env
DATA_ROOT=/mnt/data
TMPDIR=/mnt/data/tmp
OUT=/mnt/data/output/file/tmp/dsub-test/output/out.txt
$
Output of log.txt
2020-12-23 13:19:19 I: Localizing inputs.
2020-12-23 13:19:19 I: Checking image userid.
2020-12-23 13:19:19 I: Task image: ubuntu:14.04
2020-12-23 13:19:21 I: Running Docker image.
2020-12-23 13:19:21 I: Docker exit code .
2020-12-23 13:19:21 I: Ensure host user (1000:1000) owns Docker-written data
2020-12-23 13:19:22 I: Copying outputs.
2020-12-23 13:19:23 E: Error on or near line 47; /tmp/dsub-local/echo--hylke--201223-131919-18/task/runner.sh exiting with status 1
2020-12-23 13:19:23 I: Copying the logs before cleanup
$ docker run hello-world
Hello from Docker! This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID: https://hub.docker.com/
For more examples and ideas, visit: https://docs.docker.com/get-started/ $
Also, I haven't experienced any problems with Docker lately. No problems building, running and pushing containers.
I am running
```Bash
$ docker --version
Docker version 19.03.11, build dd360c7
I would be more inclined to think that some arguments may be empty, resulting in arguments being passed in the incorrect order.
The missing docker.env file looks to be the most suspicious to me, and I'm not sure why we're seeing that. It is a little bit interesting that it is the last file dsub
writes before executing the runner.sh script. I wonder if the file is still buffered and not flushed or synced before command to run docker is executed.
I believe the expectation is that because we're using with open(env_fname, 'wt') as f:
construct, Python guarantees that the file descriptor close
method is called, which flushes the file. This should mean that the subprocess for runner.sh should be able to read the docker.env file, but I'm not sure...
Thanks for the pointer. I'll try to take a closer look after the Christmas holidays.
On Thu, 24 Dec 2020, 00:35 Willy Nojopranoto, notifications@github.com wrote:
The missing docker.env file looks to be the most suspicious to me, and I'm not sure why we're seeing that. It is a little bit interesting that it is the last file dsub writes https://github.com/DataBiosphere/dsub/blob/master/dsub/providers/local.py#L377 before executing the runner.sh script. I wonder if the file is still buffered and not flushed or synced before command to run docker is executed.
I believe the expectation is that because we're using with open(env_fname, 'wt') as f: construct, Python guarantees that the file descriptor close method is called, which flushes the file. This should mean that the subprocess for runner.sh should be able to read the docker.env file, but I'm not sure...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataBiosphere/dsub/issues/219#issuecomment-750570092, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALRD5VHL6CACEQB6EM24C43SWJ5FJANCNFSM4VDFZCMQ .
It seems that the problem is indeed related to Docker, which does not like the /tmp folder. For example, this does not work
$ docker run \
--env-file=/tmp/docker.env \
"ubuntu:14.04"
docker: open /tmp/docker.env: no such file or directory.
See 'docker run --help'.
But if I copy /tmp/docker.env
to my $HOME
folder, it works okay:
$ docker run \
--env-file=$HOME/docker.env \
"ubuntu:14.04"
$
Solution: I removed the snap Docker package
sudo snap remove docker
and installed the apt Docker package instead (after adding the Docker apt repository)
sudo apt-get install docker-ce docker-ce-cli containerd.io
Now it seems to be working as expected (I am now running Docker version 20.10.1, build 831ebea
):
$ dsub \
--provider local \
--logging "${TMPDIR:-/tmp}/dsub-test/logging/" \
--output OUT="${TMPDIR:-/tmp}/dsub-test/output/out.txt" \
--command 'echo "Hello World" > "${OUT}"' \
--wait
***WARNING: No Docker image specified. The default, `ubuntu:14.04` will be used.
***WARNING: For reproducible pipelines, specify an image with the `--image` flag.
Job properties:
job-id: echo--hylke--201230-204230-26
job-name: echo
user-id: hylke
Launched job-id: echo--hylke--201230-204230-26
To check the status, run:
dstat --provider local --jobs 'echo--hylke--201230-204230-26' --users 'hylke' --status '*'
To cancel the job, run:
ddel --provider local --jobs 'echo--hylke--201230-204230-26' --users 'hylke'
Waiting for job to complete...
Waiting for: echo--hylke--201230-204230-26.
echo--hylke--201230-204230-26: SUCCESS
echo--hylke--201230-204230-26
I'm happy to see you were able to get it working. I am not familiar with snap
, but it looks like the way it containerizes docker prevents it from accessing /tmp
. I can't think of a workaround with dsub other than to use the apt package of docker as you have done.
On our end we'll update the documentation to make this clear, and provider better error handling in the local provider.
When I run the hello world example
it crashes as follows:
Any ideas how to get the Hello World example to work?
Versions: dsub: 0.4.3 Ubuntu: 20.04 Python: 3.8.5 Docker: 19.03.11
Thanks,
Hylke