ToniWestbrook / paladin

Protein Alignment and Detection Interface
MIT License
60 stars 7 forks source link

docker image downloads and indexes references to installation directory #46

Closed colin-heberling closed 3 years ago

colin-heberling commented 3 years ago

On my system I found the downloaded swiss-prot reference here:

/var/lib/docker/overlay2/ccc85725455b9c3bd02131e0874eb05efbbc31d45891f6beddd0b5efff047d4c/diff/uniprot_sprot.fasta.gz

This could prove to be problematic when building the uniref90 database because I likely wouldn't have enough disk space. Is there a way to make the docker image run like the fresh installation where references are downloaded and indexed in the working directory? Or give an option to specify output directory?

ToniWestbrook commented 3 years ago

Hi @colin-heberling, unfortunately I actually didn't create that Docker image, a contributor did and I believe it may be a few versions back at this point (1.4.1 vs 1.4.6). Are you able to compile PALADIN from source? It's a pretty easy compile with only a couple of dependencies. Let me know if that works okay. Thanks

colin-heberling commented 3 years ago

Thanks for the quick response. Unfortunately, I'm able to compile from source on my local machine very easily, but when I tried to do so on Amazon EC2 I ran into issues related to the package manager (yum) not being able to locate some of the dependencies:

[ec2-user@ip-10-60-4-189 ~]$ sudo yum install build-essential libcurl4-openssl-dev git make gcc zlib1g-dev
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
amzn2-core                                                                                                                                                                           | 3.7 kB  00:00:00     
No package build-essential available.
No package libcurl4-openssl-dev available.
Package git-2.32.0-1.amzn2.0.1.x86_64 already installed and latest version
Package 1:make-3.82-24.amzn2.x86_64 already installed and latest version
Package gcc-7.3.1-13.amzn2.x86_64 already installed and latest version
No package zlib1g-dev available.
Nothing to do

I'm not sure what the issue is here, but I may want to reach out to AWS to troubleshoot, unless you have any suggestions?

ToniWestbrook commented 3 years ago

The build-essential package you may not need, as it just contains gcc and the other gnu tools for compiling C/C++ code, and those are probably already installed on your instance (try typing gcc and see if that works). For libcurl, you can try a yum install libcurl-devel and see if that works, or if it doesn't, try doing a yum search libcurl and it should list some matching packages.

colin-heberling commented 3 years ago

When I try to run paladin with prepare it complains about not having zlib1g-dev installed.

sunitj commented 3 years ago

Output being stored at/var/lib/docker/overlay2/* means that docker's default storage location is being used. Running the docker interactively fixes this issue.

ToniWestbrook commented 3 years ago

Glad you found the issue with Docker @sunitj (I don't use Docker much so I'm not a big help). @colin-heberling - does this work for you too?

colin-heberling commented 3 years ago

Yes, this works for me too! Thanks again @sunitj!

ToniWestbrook commented 3 years ago

Thanks @sunitj and @colin-heberling - I'll put a note on the Docker section of the README too. You may still want to consider upgrading to the latest version too for the bug fixes and I think additional items in the TSV report. Feel free to open another ticket if you do go this route and it's not finding zlib. Thanks again -

sunitj commented 3 years ago

@ToniWestbrook Note that quay.io/biocontainers/paladin:1.4.6--h1b8c3c0_2 docker image (not mine) seems to be pinned to the latest version (v1.4.6). In case you wanted to update this in your README as well.

Also, thank you for this amazing tool!

ToniWestbrook commented 3 years ago

Oh that's good, thanks for checking that out!