Closed colin-heberling closed 3 years ago
Hi @colin-heberling, unfortunately I actually didn't create that Docker image, a contributor did and I believe it may be a few versions back at this point (1.4.1 vs 1.4.6). Are you able to compile PALADIN from source? It's a pretty easy compile with only a couple of dependencies. Let me know if that works okay. Thanks
Thanks for the quick response. Unfortunately, I'm able to compile from source on my local machine very easily, but when I tried to do so on Amazon EC2 I ran into issues related to the package manager (yum) not being able to locate some of the dependencies:
[ec2-user@ip-10-60-4-189 ~]$ sudo yum install build-essential libcurl4-openssl-dev git make gcc zlib1g-dev
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
amzn2-core | 3.7 kB 00:00:00
No package build-essential available.
No package libcurl4-openssl-dev available.
Package git-2.32.0-1.amzn2.0.1.x86_64 already installed and latest version
Package 1:make-3.82-24.amzn2.x86_64 already installed and latest version
Package gcc-7.3.1-13.amzn2.x86_64 already installed and latest version
No package zlib1g-dev available.
Nothing to do
I'm not sure what the issue is here, but I may want to reach out to AWS to troubleshoot, unless you have any suggestions?
The build-essential package you may not need, as it just contains gcc and the other gnu tools for compiling C/C++ code, and those are probably already installed on your instance (try typing gcc
and see if that works). For libcurl, you can try a yum install libcurl-devel
and see if that works, or if it doesn't, try doing a yum search libcurl
and it should list some matching packages.
When I try to run paladin with prepare it complains about not having zlib1g-dev installed.
Output being stored at/var/lib/docker/overlay2/*
means that docker's default storage location is being used. Running the docker interactively fixes this issue.
Step into the docker image at your current working dir and mount that dir. Set entrypoint as bash
[ec2-user@ip-10-60-5-31 Paladin]$ docker run --rm -it --workdir $(pwd) --volume $(pwd):$(pwd) quay.io/biocontainers/paladin:1.4.6--h1b8c3c0_2 bash
Unable to find image 'quay.io/biocontainers/paladin:1.4.6--h1b8c3c0_2' locally
1.4.6--h1b8c3c0_2: Pulling from biocontainers/paladin
cefc4d495539: Pull complete
4ca545ee6d5d: Pull complete
d4065764faf9: Pull complete
Digest: sha256:7f88401242e6e89fbf555743fe64672e556ee060a658bc57eeee44c048ac99d1
Status: Downloaded newer image for quay.io/biocontainers/paladin:1.4.6--h1b8c3c0_2
Test run (i ran this on a tiny instance; pls ignore the memory error)
root@36aefb5db1b5:/mnt/efs/databases/Paladin# paladin prepare -r1
[M::downloadUniprotReference] Downloading ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz...
[M::cleanUniprotReference] Cleaning UniProt reference...
[M::command_index] Translating protein sequence...0.00 sec
[M::command_index] Packing protein sequence... 4.14 sec
[M::command_index] Constructing BWT for the packed sequence... [is_bwt] Failed to allocate 3261613144 bytes at is.c line 212: Cannot allocate memory
check where the files were downloaded. They are in the current work dir!
root@36aefb5db1b5:/mnt/efs/databases/Paladin# ls
uniprot_sprot.fasta.gz uniprot_sprot.fasta.gz.amb uniprot_sprot.fasta.gz.ann uniprot_sprot.fasta.gz.pac uniprot_sprot.fasta.gz.pro
Ctrl+D to step out of the docker
The files are still here!!
[ec2-user@ip-10-60-5-31 Paladin]$ ls
uniprot_sprot.fasta.gz uniprot_sprot.fasta.gz.amb uniprot_sprot.fasta.gz.ann uniprot_sprot.fasta.gz.pac uniprot_sprot.fasta.gz.pro
Glad you found the issue with Docker @sunitj (I don't use Docker much so I'm not a big help). @colin-heberling - does this work for you too?
Yes, this works for me too! Thanks again @sunitj!
Thanks @sunitj and @colin-heberling - I'll put a note on the Docker section of the README too. You may still want to consider upgrading to the latest version too for the bug fixes and I think additional items in the TSV report. Feel free to open another ticket if you do go this route and it's not finding zlib. Thanks again -
@ToniWestbrook Note that quay.io/biocontainers/paladin:1.4.6--h1b8c3c0_2
docker image (not mine) seems to be pinned to the latest version (v1.4.6
). In case you wanted to update this in your README as well.
Also, thank you for this amazing tool!
Oh that's good, thanks for checking that out!
On my system I found the downloaded swiss-prot reference here:
/var/lib/docker/overlay2/ccc85725455b9c3bd02131e0874eb05efbbc31d45891f6beddd0b5efff047d4c/diff/uniprot_sprot.fasta.gz
This could prove to be problematic when building the uniref90 database because I likely wouldn't have enough disk space. Is there a way to make the docker image run like the fresh installation where references are downloaded and indexed in the working directory? Or give an option to specify output directory?