blobtoolkit / blobtoolkit-docker

[Archived] Docker images for BlobToolKit
1 stars 1 forks source link

error: docker: invalid reference format. #1

Closed angelajmcd closed 1 year ago

angelajmcd commented 4 years ago

Hi there,

I'm trying to run blobtoolkit in a docker container locally by modifying your suggested code. Here's what I'm entering:

docker run -it --rm --name btk \
           -u $UID:$GROUPS \
      - v /path/to/datasets:/blobtoolkit/datasets \
           -v /path/to/input/data:/blobtoolkit/data \
           genomehubs/blobtoolkit:latest \
           ./blobtools2/blobtools create \
           --fasta /Users/amcdonnell/Analyses/blobtools/membranacea.contigs.fasta/membranacea.contigs.fasta \
           --taxid 1234 \
           --taxdump taxdump \
           datasets/membranacea_dataset

I've tried editing the above in various ways including deleting the -u part (I suspect I don't need it, but I don't really know). I keep getting the error:

docker: invalid reference format.

Any suggestions on what I could do differently to get this software to run?

rjchallis commented 4 years ago

Hi

Your command should probably look more like:

docker run -it --rm --name btk \
           -u $UID:$GROUPS \
       -v /Users/amcdonnell/Analyses/blobtools/datasets:/blobtoolkit/datasets \
           -v /Users/amcdonnell/Analyses/blobtools/membranacea.contigs.fasta:/blobtoolkit/data \
           genomehubs/blobtoolkit:latest \
           ./blobtools2/blobtools create \
           --fasta /blobtoolkit/data/membranacea.contigs.fasta \
           --taxid 1234 \
           --taxdump taxdump \
           /blobtoolkit/datasets/membranacea_dataset

the -v options bind directories on your local system to directories in the btk container. As BlobToolKit runs inside the container, it can't see the external file paths so any files you refer to in the blobtools create command have to be relative to the container filesystem.

By default commands in the container run in a /blobtoolkit directory that already contains blobtools2, data and datasets subdirectories so it's easiest to bind to these.

The docker: invalid reference format. error is likely caused by a space in one of the - v flags so hopefully won't happen again with the suggested command above.

angelajmcd commented 4 years ago

Hey, thanks, Richard! I'm new to docker, so thanks for the explanation.

I tried running it a few different times with different paths, and the most successful try so far gave me a ModuleNotFoundError: No module named 'docopt':

(base) mac-amcdonnell-p:~ amcdonnell$ docker run -it --rm --name btk -u $UID:$groups -v /Users/amcdonnell/Analyses/blobtools/datasets:/blobtoolkit/datasets -v /Users/amcdonnell/Analyses/blobtools/membranacea.contigs.fasta:/blobtoolkit/data genomehubs/blobtoolkit:latest ./blobtools2/blobtools create --fasta /blobtoolkit/data/membranacea.contigs.fasta --taxid 1234 --taxdump taxdump /blobtoolkit/datasets/membranacea_dataset Traceback (most recent call last): File "./blobtools2/blobtools", line 48, in from docopt import docopt ModuleNotFoundError: No module named 'docopt'

Is this something that needs to be corrected in the blobtoolkit docker image/container?

Thanks again, Angela

rjchallis commented 4 years ago

This looks like a problem with the PYTHONPATH variable. Docopt is installed and when I test the container it runs fine but on some systems it seems that there can be an issue with finding the package.

Could you try adding -e PYTHONPATH=/home/blobtoolkit/miniconda3/envs/btk_env/lib/python3.6/site-packages to your docker command and see if that helps? (should bee at the same level as the -u and -v options) I can try to find a more robust solution once I know if that works.

angelajmcd commented 4 years ago

Hi again,

I tried that, and get the same error... any other suggestions?

Traceback (most recent call last): File "./blobtools2/blobtools", line 48, in from docopt import docopt ModuleNotFoundError: No module named 'docopt'

rjchallis commented 4 years ago

I've updated the container image with the latest blob tools code and set an explicit PYTHONPATH so hopefully this will start working for you if you docker pull genomehubs/blobtoolkit:latest to get the most recent version.

angelajmcd commented 4 years ago

ok, it looks like that worked! Now I'm getting another error about an empty file. Is it a problem with my -v paths? I have the file membranacea.contigs.fasta (almost 1.2 GB, so not empty) in /Users/amcdonnell/Analyses/blobtools/datasets

amcdonnell$ docker run -it --rm --name btk -u $UID:$groups -v /Users/amcdonnell/Analyses/blobtools/datasets:/blobtoolkit/datasets -v /Users/amcdonnell/Analyses/blobtools/membranacea.contigs.fasta:/blobtoolkit/data genomehubs/blobtoolkit:latest ./blobtools2/blobtools create --fasta /blobtoolkit/data/membranacea.contigs.fasta --taxid 1234 --taxdump taxdump /blobtoolkit/datasets/membranacea_dataset Loading sequences from /blobtoolkit/data/membranacea.contigs.fasta 0it [00:00, ?it/s]cat: /blobtoolkit/data/membranacea.contigs.fasta: No such file or directory 0it [00:00, ?it/s] Traceback (most recent call last): File "/blobtoolkit/blobtools2/lib/add.py", line 165, in main() File "/blobtoolkit/blobtools2/lib/add.py", line 132, in main meta=meta) File "/blobtoolkit/blobtools2/lib/fasta.py", line 77, in parse 'range': [min(gc_portions), max(gc_portions)] ValueError: min() arg is an empty sequence

angelajmcd commented 4 years ago

Hey again! I figured out what was wrong and fixed it. It looks like it ran; it parsed all my contigs, but then I got this error: Parsing taxdump ERROR: Unable to parse /blobtoolkit/taxdump/nodes.dmp.

I'm not sure what nodes.dmp is or what it should come from... should I provide a file for taxdump? and taxid? I'll look it up, too but I wanted to update.

rjchallis commented 4 years ago

Hi - sorry for the delay in replying.

I made some changes to the directory structure in the container and it looks like I forgot to update the path in the example command, the taxdump is available in the container at /blobtoolkit/databases/ncbi_taxdump so the command should contain

--taxdump databases/ncbi_taxdump
angelajmcd commented 4 years ago

Hey again. No problem! Thanks for letting me know about taxdump. It worked. I'm now trying to generate plots and getting stopped by a "waiting for element cumulative_save_png" or "waiting for element snail_save_png" and I'm not sure what to do with it or how long to let it sit. Does this part take a while? Maybe I'm missing some code?

(base) mac-amcdonnell-p:blobtools amcdonnell$ docker exec -it btk ./blobtools2/blobtools view --host http://localhost:8080 --out output --view cumulative membranacea_dataset Loading http://localhost:8080/view/dataset/membranacea_dataset/cumulative?staticThreshold=Infinity&nohitThreshold=Infinity&plotGraphics=svg Fetching membranacea_dataset.cumulative.png waiting for element cumulative_save_png

(& the same thing happens for snail_save_png)

rjchallis commented 4 years ago

Hi

This step should run quite quickly so there is clearly something not working as expected. I'm having trouble working out where the problem is so I can't suggest an immediate fix. You could try viewing the plots interactively in a web browser outside the container if you bind ports for the API and client when starting the btk container, so:

remove the existing container

docker rm -f btk

start the container again with bound ports:

docker run -d --rm --name btk \
-p 8080:8080 \
-p 8000:8000 \
-v /path/to/datasets:/blobtoolkit/datasets \
-e VIEWER=true \
genomehubs/blobtoolkit:latest

then view the plots in a browser by visiting http://localhost:8080/view#Datasets

angelajmcd commented 4 years ago

Hey again, Richard.

I'm sorry if I keep making silly errors. Using this through Docker is just not intuitive to me, I think. Here is what I'm doing and what I'm getting out. I am still getting the same error as my last post. Perhaps I didn't incorporate your suggestion correctly?

After removing container, I ran: docker run -it --rm --name btk -u $UID:$groups -v /Users/amcdonnell/Analyses/blobtools/datasets:/blobtoolkit/datasets -v /Users/amcdonnell/Analyses/blobtools/membranacea.contigs.tab.edited.fasta:/blobtoolkit/data genomehubs/blobtoolkit:latest ./blobtools2/blobtools create --fasta /blobtoolkit/datasets/membranacea.contigs.tab.edited.fasta --taxid 1234 --taxdump databases/ncbi_taxdump /blobtoolkit/datasets/membranacea_dataset

output to screen Loading sequences from /blobtoolkit/datasets/membranacea.contigs.tab.edited.fasta

Then I ran docker run -d --rm --name btk -p 8080:8080 -p 8000:8000 -v /Users/amcdonnell/Analyses/blobtools/datasets:/blobtoolkit/datasets -e VIEWER=true genomehubs/blobtoolkit:latest

output to screen 8b3c47dbcaf798de734b8954496aa4c676e9e54a96b36a1bb268c446c6ad1fe9

then I ran this to make a plot docker exec -it btk ./blobtools2/blobtools view --host http://localhost:8080 --out output --view cumulative membranacea_dataset

output to screen, stays stuck here Loading http://localhost:8080/view/dataset/membranacea_dataset/cumulative?staticThreshold=Infinity&nohitThreshold=Infinity&plotGraphics=svg Fetching membranacea_dataset.cumulative.png waiting for element cumulative_save_png

rjchallis commented 4 years ago

Hi

Sorry if I wasn't very clear last time. I tried to run the blobtools view command in docker to see where you may have gone wrong but although I didn't hit the same error, I couldn't see any output files on my local filesystem even after the command appeared to run successfully so I'll need to debug that but am currently unsure where to start.

For an alternative that should get around this, you are on the right track with running

docker run -d --rm --name btk \
                   -p 8080:8080 -p 8000:8000 \
                   -v /Users/amcdonnell/Analyses/blobtools/datasets:/blobtoolkit/datasets \
                   -e VIEWER=true \
                   genomehubs/blobtoolkit:latest

This should give you access to the viewer from outside the Docker container so now instead of using docker exec to run blobtools view, you should be able to open a web browser on your local machine and visit http://localhost:8080/view/dataset/membranacea_dataset/cumulative to see the cumulative plot.

If you are running docker on a remote server, you can forward the ports over ssh by connecting with something like ssh -L 8080:localhost:8080 -L 8000:localhost:8000 username@remote_server, then you can connect as if it was running locally with the link above.

angelajmcd commented 4 years ago

Hey Richard,

I've been running this through docker locally and still can't get any plots to appear, through any browser, even after following your above advice (I get this: "This page isn't working. localhost didn't send any data. ERR_EMPTY_RESPONSE"). It looks like the tutorials may have been updated in the last few weeks, so I will take a look there for now.

Angela

rjchallis commented 4 years ago

Sorry to hear this still isn't working. I've not seen that error before so don't know what is causing it. I've rebuilt the container with updated dependencies so the latest container version is 1.3.1, if you still can't see anything with:

docker run -d \
       -n btk \
       -e VIEWER=true \
       -v /Users/amcdonnell/Analyses/blobtools/datasets:/blobtoolkit/datasets \
       -p 8080:8080 \
       -p 8000:8000 \
       genomehubs/blobtoolkit:1.3.1

Could you send me the results of ls -al /Users/amcdonnell/Analyses/blobtools/datasets and attach the output of docker logs btk and docker exec -it btk curl 'http://localhost:8080' so I can try to work out what is going wrong.

The new container can run the entire pipeline without creating additional conda environments (though including all the dependencies has made it rather large). There's some notes on the new version in a github gist that may be useful. The notes swap between singularity and docker as it can run in either - let me know if you want any examples translated from one to the other.

angelajmcd commented 4 years ago

Hi again! Ok thanks. I updated the container and restarted analyses. I got the taxdump error again: ERROR: Unable to parse /blobtoolkit/databases/ncbi_taxdump/nodes.dmp.

I used the docker run command as above with both --taxdump taxdump and --taxdump databases/ncbi_taxdump. Does that need to be corrected on your end or mine?

rjchallis commented 4 years ago

Ah, looks like I'd missed out on adding the taxdump when rebuilding the Docker image with all the other dependencies inside.

I've updated the image (1.3.2) with the tax dump restored and tested a few commands that should hopefully help you get some plots out this time.

Run blobtools create with genomehubs/blobtoolkit:1.3.2 so --taxdump databases/ncbi_taxdump will work:

docker run -it \
           --rm \
           --name btk \
           -u $UID:$groups \
           -v `pwd`/datasets:/blobtoolkit/datasets \
           -v `pwd`:/blobtoolkit/data \
           genomehubs/blobtoolkit:1.3.2 \
           blobtools create \
               --fasta data/membranacea.contigs.fasta \
               --taxid 1234 \
               --taxdump databases/ncbi_taxdump \
               datasets/membranacea_dataset

Start up the Viewer with ports bound so you have the option of opening it up in a browser:

docker run -d --rm --name btk \
           -p 8080:8080 \
           -p 8000:8000 \
           -v `pwd`/datasets:/blobtoolkit/datasets \
           -v `pwd`/output:/blobtoolkit/output \
           -e VIEWER=true \
           genomehubs/blobtoolkit:1.3.2

Use docker exec to run blobtools view to get a snail plot:

docker exec -it btk \
            blobtools view \
                --host http://localhost:8080 \
                --out output \
                --view snail \
                membranacea_dataset

This will show an error when it tries to stop the viewer that is controlled by a different process bu this happens after the plot is generated so can be ignored:

Traceback (most recent call last):
  File "/blobtoolkit/blobtools2/lib/view.py", line 309, in <module>
    main()
  File "/blobtoolkit/blobtools2/lib/view.py", line 302, in main
    viewer.send_signal(signal.SIGINT)
AttributeError: 'NoneType' object has no attribute 'send_signal'

Alternatively omit the --host option and it will spin up a new viewer instance inside the container for this one command. Will take slightly longer but won't hit the error above:

docker exec -it btk \
            blobtools view \
                --out output \
                --view snail \
                datasets/membranacea_dataset

Your create command doesn't include any BLAST hits so the cumulative plot won't work. You can get around this by passing an empty file to blobtools add, which will label everything as no-hit:

touch output/blank.hits.out

docker exec -it btk \
            blobtools add \
            --hits output/blank.hits.out \
            --taxdump databases/ncbi_taxdump \
            datasets/membranacea_dataset

Then the cumulative plot should work:

docker exec -it btk \
            blobtools view \
                --out output \
                --view cumulative \
                datasets/membranacea_dataset

Hope this works!

angelajmcd commented 4 years ago

Hey again, Richard,

The newer image (1.3.2) was pulled but I am still getting a taxdump error with the first command:

docker run -it --rm --name btk -u $UID:$groups -v /Users/amcdonnell/Analyses/blobtools/datasets:/blobtoolkit/datasets -v /Users/amcdonnell/Analyses/blobtools/membranacea.contigs.tab.edited.fasta:/blobtoolkit/data genomehubs/blobtoolkit:latest ./blobtools2/blobtools create --fasta /blobtoolkit/datasets/membranacea.contigs.tab.edited.fasta --taxid 1234 --taxdump databases/ncbi_taxdump /blobtoolkit/datasets/membranacea_dataset

then lots of tigs get processed, output to screen ends with: - processing tig00030305=37045=1suggestBubble=noCircular=no: : 2122it [00:55, 38.16it/s] Parsing taxdump ERROR: Unable to parse /blobtoolkit/databases/ncbi_taxdump/nodes.dmp.

rjchallis commented 4 years ago

just checking - did you run

docker pull genomehubs/blobtoolkit:1.3.2

or

docker pull genomehubs/blobtoolkit:latest

I just checked and the latest image tag was still pointing to 1.3.1 so the tax dump issue was still there. I've retagged latest so it is now the most up to date, but if you use genomehubs/blobtoolkit:1.3.2 in your command instead of genomehubs/blobtoolkit:latest it will definitely use that version.

If so you could run

angelajmcd commented 4 years ago

Sure.

docker pull genomehubs/blobtoolkit:1.3.2 1.3.2: Pulling from genomehubs/blobtoolkit Digest: sha256:0a81055d1ba4fcdd918c549355419718abef00b8c4771d9cc87be17bfff91e02 Status: Image is up to date for genomehubs/blobtoolkit:1.3.2 docker.io/genomehubs/blobtoolkit:1.3.2

I ran it with genomehubs/blobtoolkit:latest and got the error again. When I swapped in the 1.3.2 it worked! Thanks so much again.

angelajmcd commented 4 years ago

Hi again, So, when I try generate the snail plot using either method, I end up with the message "waiting for element snail_save_png".

`(base) mac-amcdonnell-p:blobtools amcdonnell$ docker exec -it btk \

        blobtools view \
            --out output \
            --view snail \
            datasets/membranacea_dataset

Initializing viewer |███████████████████████████████████████| 15/15 seconds Loading http://localhost:8002/view/dataset/membranacea_dataset/snail?staticThreshold=Infinity&nohitThreshold=Infinity&plotGraphics=svg Fetching datasets/membranacea_dataset.snail.png waiting for element snail_save_png` (and then nothing happens)

When I attempt the cumulative plot: (base) mac-amcdonnell-p:blobtools amcdonnell$ touch output/blank.hits.out (base) mac-amcdonnell-p:blobtools amcdonnell$ docker exec -it btk \

        blobtools add \
        --hits output/blank.hits.out \
        --taxdump databases/ncbi_taxdump \
        datasets/membranacea_dataset

ERROR: 'identifiers.json' was not found in the BlobDir. ERROR: You may need to rebuild the BlobDir to run this command.

I think I have some blast output somewhere, I will find it and try the cumulative plot again.

rjchallis commented 4 years ago

That's odd, the view command getting stuck at that point suggests it is not able to connect to the Viewer being hosted in the container. I can't reproduce that with the latest container versions (1.3.2 or a 1.3.3 version that I uploaded today) so I'm not sure what to suggest.

Could you try

docker exec -it btk curl -I "http://localhost:8080"

to check that it is able to find the Viewer. The response should look like

HTTP/1.1 200 OK
X-Powered-By: Express
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Content-Length: 13950
ETag: W/"367e-bBsVcppC4N2kH3mrJNTN97JlzLQ"
Vary: Accept-Encoding
Date: Fri, 21 Aug 2020 13:51:21 GMT
Connection: keep-alive
github-actions[bot] commented 1 year ago

This issue has being marked as dormant because blobtoolkit-docker is being archived. The code is now part of the main BlobToolKit repository.

If you feel the issue has not been resolved, please follow the updated BlobToolKit installation instructions to first confirm that you are using the latest version, and then open a new issue at the main BlobToolKit repository if necessary.

This issue will be automatically closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 7 days since being marked as dormant.