genomicsITER / NanoCLUST

NanoCLUST is an analysis pipeline for UMAP-based classification of amplicon-based full-length 16S rRNA nanopore reads
MIT License
106 stars 49 forks source link

Installing NanoCLUST as lmod module #14

Closed eyashiro closed 3 years ago

eyashiro commented 4 years ago

Hi, I have a couple users who requested that NanoCLUST be installed but I would like to ask for your help to understand how I can get it installed as an lmod module so that I can make this and future versions available on our compute nodes. I am not very familiar with the nextflow and also trying to install all the dependencies in a single conda environment that is loadable from our software stack onto our different machines is taking unusually long. I also tried to install a local instance using your standard nextflow method, but I don't see any way to make the NanoCLUST git clone + work directories read-only prior to porting into our lmod software stack. This crashed any attempt to run NanoCLUST from a fresh working folder because either the entire work/ or just the conda/ were made read+execute-only. Thank you for your help. Erika

genomicsITER commented 4 years ago

Hi Erika

Thank you for opening this issue. At this time we haven't tested the pipeline as an lmod module and we don't have prior experience with lmod but we will try to help you and get familiar with lmod if problems persists. Work directory (which stores temporary and intermediate files, conda environments...) can be specified with "-w" in the pipeline command. Pointing both work and results ("--outdir") directories to a a path with write permissions should work.

Regarding dependency installation using conda/docker, building a Conda environment comprising every tool and its dependencies turned out to be very problematic for us, so we ended up building an environment for each pipeline step. The same happens with Docker containers. If problems persist, we could find a solution (maybe one BIG Docker container with Nextflow and pipeline environments could work for you). Let us know.

Regards Héctor

eyashiro commented 4 years ago

Hello Héctor,

Thank you very much for responding to my cry for help. I tried to run the test using the "-w" and "--outdir" options but I still get an error. Essentially, I ran the original installation under a trial directory called NanoCLUST (ended with conda files in NanoCLUST/work/), made that directory read-only because users should not be able to change the software installation, then ran a test from a directory called testwork/.

At the end of this message, I copy paste the error message.

Essentially, I really can't let the users modify the installation files because they are accessed by multiple users from different machines. The point for us of using the lmod system has been to be able to have static versions installed from a single dev machine that users can access from all of our nodes. I'm hoping that we can maintain a similar structure with nanoclust with your help. (If it's not possible, then I'll have to try to frankenstein something on a local machine, altho not ideal.)

Thank you again for your help and I look forward to your ideas on a possible solution for this.

Kind regards, Erika

Input and output: $ nextflow run ../NanoCLUST/main.nf -profile test,conda -w ../NanoCLUST/work/ --outdir ./results Run Name : irreverent_ptolemy Reads : /home/admin/ey/Software/nanoclust/NanoCLUST/test_datasets/mock4_run3bc08_5000.fastq Max Resources : 128 GB memory, 16 cpus, 10d time per job Output dir : ./results Launch dir : /home/admin/ey/Software/nanoclust/testwork Working dir : /home/admin/ey/Software/nanoclust/NanoCLUST/work Script dir : /home/admin/ey/Software/nanoclust/NanoCLUST User : ey@bio.aau.dk Config Profile : test,conda Config Description: Minimal test dataset to check pipeline function

[- ] process > QC - [- ] process > QC - [- ] process > fastqc - [- ] process > kmer_freqs - [- ] process > read_clustering - [- ] process > split_by_cluster - [- ] process > read_correction - [- ] process > draft_selection - [- ] process > racon_pass - [- ] process > medaka_pass - [- ] process > consensus_classification - [- ] process > join_results - [- ] process > get_abundances - [- ] process > plot_abundances - [- ] process > output_documentation - Error executing process > 'QC (1)'

Caused by: java.io.FileNotFoundException: /home/admin/ey/Software/nanoclust/NanoCLUST/work/conda/.fastp-e59775771c24589528475fae521efb6e.lock (Permission denied) [0;35m[nf-core/nanoclust] Pipeline completed with errors

genomicsITER commented 4 years ago

Hi Erika,

The command above points work and output directories to a path under "NanoCLUST" (eg. "-w ../NanoCLUST/work/") which I assume that contains the pipeline code such as main.nf and script files (which should not be accessed or modified by the users).

I recommend to create an independent directory outside "NanoCLUST" which could be used to store intermediate files (work) and outputs (results):

Example:

With the defined structure above, you could use the following command (I'm using absolute paths here):

$ nextflow run /home/hector/NanoCLUST/main.nf -profile test,conda -w /home/hector/public_files_NanoCLUST/work/ --outdir /home/hector/public_files_NanoCLUST/results/

Final tip: When getting an error at a intermediate step of the pipeline, you can execute it again adding "-resume" flag to the command.