SystemsGenetics / EnTAPnf

Functional Annotation of Gene Lists
MIT License
3 stars 4 forks source link

Python script for data downloads. #5

Open spficklin opened 4 years ago

spficklin commented 4 years ago

Currently there are several scripts that get the data needed for AnnoTater. For the data sets that will use Diamond they must be indexed and currently the scripts have a Docker command hardcoded. That docker command does not work on HPC systems that only support Singularity nor on Kubernetes.

A solution to be more flexible would be to create a python script that could be more flexible.

The variables to consider providing to the script would be

  1. The Annotater version: this would be used to specify the docker image names (e.g. annotater/diamond:0.9.25-[version])
  2. The container manager: singularity or docker
  3. Anything else that helps specify needed mount points (such as with Kubernetes).
spficklin commented 4 years ago

Here are the singularity command for Kamiak:

For the Uniprot SwissProt database. This command is meant to be run in the same directory as the data file. The -B ${PWD} argument mounts the current directory in the singularity image and singularity then runs diamond in the current working directory (same as mounted with -B)

singularity exec -B ${PWD} docker://annotater/diamond:0.9.25-0.9 diamond makedb --threads 4 --in uniprot_sprot.fasta 
spficklin commented 4 years ago

See PR #8