MaestSi / nf-m6anet

A NextFlow pipeline for m6A detection from Nanopore direct RNA-seq data
GNU General Public License v3.0
6 stars 0 forks source link

Process `minimap2 (2)` terminated with an error exit status (1) #1

Closed kwonej0617 closed 1 year ago

kwonej0617 commented 1 year ago

Hi @MaestSi! Thank you for providing the nextflow pipeline for m6anet. I gave a try to run your pipeline but I got an error. As I am not familiar to use nextflow, I don't know how to deal with this error. Could you please help me?

(/home/euijin.kwon-umw/Euijin/env/nextflow) [euijin.kwon-umw@c6525c14 nf-m6anet]$ ../nextflow -c nf-m6anet.conf run nf-m6anet.nf 
N E X T F L O W  ~  version 22.10.7
Launching `nf-m6anet.nf` [condescending_rubens] DSL1 - revision: be1f496eb1
WARN: Access to undefined parameter `help` -- Initialise it to a default value eg. `params.help = some_value`
executor >  local (1)
[7c/9cdfd6] process > minimap2 (2)   [  0%] 0 of 2
[-        ] process > nanopolish     -
executor >  local (2)
[49/f12068] process > minimap2 (1)   [100%] 1 of 1, failed: 1
[-        ] process > nanopolish     -
[-        ] process > m6anet1        -
[-        ] process > m6anet2        -
[-        ] process > postprocessing -
Error executing process > 'minimap2 (2)'

Caused by:
  Process `minimap2 (2)` terminated with an error exit status (1)

Command executed:

  mkdir -p /home/euijin.kwon-umw/Euijin/nf-m6anet/data/HEK293T-WT_KO-rep1/KO/Test/transcriptomeAlignment/  minimap2 -x map-ont -k14 -t 1 -a transcriptome.fa /pi/chan.zhou-umw/SeqData/3rd_seq/xPore/HEK293T-Mettl3-KO-rep1/fastq/HEK293T-Mettl3-KO-rep1_basecalled.fastq.gz | samtools view -hSb | samtools sort -@ 1 -o /home/euijin.kwon-umw/Euijin/nf-m6anet/data/HEK293T-WT_KO-rep1/KO/Test/transcriptomeAlignment/minimapT.bam
  samtools view /home/euijin.kwon-umw/Euijin/nf-m6anet/data/HEK293T-WT_KO-rep1/KO/Test/transcriptomeAlignment/minimapT.bam -bh -F 2324 | samtools sort -@ 1 -o /home/euijin.kwon-umw/Euijin/nf-m6anet/data/HEK293T-WT_KO-rep1/KO/Test/transcriptomeAlignment/minimap.filt.sortT.bamip
  samtools index -@ 1 /home/euijin.kwon-umw/Euijin/nf-m6anet/data/HEK293T-WT_KO-rep1/KO/Test/transcriptomeAlignment/minimap.filt.sortT.bam
  ln -s /home/euijin.kwon-umw/Euijin/nf-m6anet/data/HEK293T-WT_KO-rep1/KO/Test/transcriptomeAlignment/minimap.filt.sort.bam ./minimap.filt.sortT.bam
  ln -s /home/euijin.kwon-umw/Euijin/nf-m6anet/data/HEK293T-WT_KO-rep1/KO/Test/transcriptomeAlignment/minimap.filt.sort.bam.bai ./minimap.filt.sortT.bam.bai

Command exit status:
  1

Command output:
  (empty)

Command error:
  .command.sh: line 3: minimap2: command not found
  [main_samview] fail to read the header from "-".
  samtools sort: failed to read header from "-"

Work dir:
  /home/euijin.kwon-umw/Euijin/nf-m6anet/work/7c/9cdfd6f3203208329fe5e494072533

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

In my nf-m6anet.conf, I had my parameters as followings.

 params{
        // Path to the sample description file
        samples = "samples.txt"

        // Path to a folder where to store results
        resultsDir = "/home/euijin.kwon-umw/Euijin/nf-m6anet/data/HEK293T-WT_KO-rep1"

        // Path to the transcriptome fasta
        transcriptome_fasta = "/home/euijin.kwon-umw/Euijin/xpore/reference/Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.fa"

        // Gtf file
        gtf = "/home/euijin.kwon-umw/Euijin/xpore/reference/Homo_sapiens.GRCh38.91.gtf"

        // Probability modification threshold for calling a site as m6A+
        prob_mod_thr = 0.9

        // Path to post-processing R script
        postprocessingScript = "Transcript_to_genome.R"

        //Path to bulk level m6A estimator script
        bulkLevelScript = "Calculate_m6anet_bulk.R"

        // Flags to select which process to run
        minimap2 = true
        nanopolish = true
        m6anet1 = true
        m6anet2 = true
        postprocessing = true
}

I am looking forward to hearing from you! Thank you!

MaestSi commented 1 year ago

Hi, would you like to run the pipeline using Docker or Singularity? Depending on this, you should either add -profile dockeror -profile singularity in the command line. Of course, either Docker or Singularity should be previously installed. SM

kwonej0617 commented 1 year ago

Thank you for your reply, @MaestSi! Actually, I am not really familiar with singularity and docker. So, I am wondering if you have provided any singularity or docker images in your package. There is Dockerfile in nf-m6anet package. Is it a different one from what you are talking about? If not, could you please let me know how to install it? I really appreciate your help!

MaestSi commented 1 year ago

Hi, Yes, I have already provided a docker image, which I build using the Dockerfile you see in the repo, and it is loaded on Dockerhub repository. So, when you run the pipeline, Nextflow will automatically download the image from Dockerhub, convert it to a Singularity image (if needed), and run the pipeline without needing to manually install all the tools. What you need is just a working Docker or Singularity installation, namely the “engine” capable of dealing with these images. If you are running the pipeline on your “local” desktop/laptop, I’d suggest using Docker, while if you are running the pipeline on a hpc cluster, Singularity may be the best choice, and could even be already installed (ask the system admin, in case). Please click on either Docker or Singularity hypertext links in the README and follow the instructions for installing one of those. Best, SM

kwonej0617 commented 1 year ago

Thank you @MaestSi for your reply. I run the pipeline on a HPC cluster and my script was as follows.

#! /bin/bash
#BSUB -L /bin/bash
#BSUB -J nf-m6Anet
#BSUB -q large
#BSUB -o LSF/out.HEK293T-WT_KO-rep1
#BSUB -e LSF/err.HEK293T-WT_KO-rep1
#BSUB -n 1 -W 72:00
#BSUB -R span[hosts=1]
#BSUB -R rusage[mem=8000]

module load java/default
module load conda/init
source activate /home/euijin.kwon-umw/Euijin/env/nextflow
../nextflow -c nf-m6anet.conf run nf-m6anet.nf -profile singularity

However, the job was killed in the middle of the process and I got the following in the log file.

N E X T F L O W  ~  version 22.10.7
Launching `nf-m6anet.nf` [shrivelled_marconi] DSL1 - revision: be1f496eb1
WARN: Access to undefined parameter `help` -- Initialise it to a default value eg. `params.help = some_value`
[-        ] process > minimap2       -
[-        ] process > nanopolish     -
[-        ] process > m6anet1        -
[-        ] process > m6anet2        -
[-        ] process > postprocessing -

[-        ] process > minimap2       [  0%] 0 of 2
[-        ] process > nanopolish     -
[-        ] process > m6anet1        -
[-        ] process > m6anet2        -
[-        ] process > postprocessing -

[-        ] process > minimap2       [  0%] 0 of 2
[-        ] process > nanopolish     -
[-        ] process > m6anet1        -
[-        ] process > m6anet2        -
[-        ] process > postprocessing -
Error executing process > 'minimap2 (2)'

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  qsub -N nf-minimap2_2 .command.run

Command exit status:
  255
Command output:
  qsub: illegal option -- N
  Usage: qsub [-a date-time] [-e stderr-filename] [-eo] [-i]
              [-l] [-lc core-limit] [-ld data-limit] [-lf file-limit]
              [-lm memory-limit] [-ls stack-limit] [-lt cpu-limit] [-mb] [-me]
              [-o stdout-filename] [-q queue-name] [-r request-name]
              [-s shell-name] [-x] [-z]

Work dir:
  /home/euijin.kwon-umw/Euijin/nf-m6anet/work/16/26586aba342401333799be9aa027e6

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

[16/26586a] process > minimap2 (2)   [ 50%] 1 of 2, failed: 1
[-        ] process > nanopolish     -
[-        ] process > m6anet1        -
[-        ] process > m6anet2        -
[-        ] process > postprocessing -
Error executing process > 'minimap2 (2)'

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  qsub -N nf-minimap2_2 .command.run

Command exit status:
  255

Command output:
  qsub: illegal option -- N
  Usage: qsub [-a date-time] [-e stderr-filename] [-eo] [-i]
              [-l] [-lc core-limit] [-ld data-limit] [-lf file-limit]
              [-lm memory-limit] [-ls stack-limit] [-lt cpu-limit] [-mb] [-me]
              [-o stdout-filename] [-q queue-name] [-r request-name]
              [-s shell-name] [-x] [-z]

Work dir:
  /home/euijin.kwon-umw/Euijin/nf-m6anet/work/16/26586aba342401333799be9aa027e6

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Based on the tip above, I moved to /home/euijin.kwon-umw/Euijin/nf-m6anet/work/16/26586aba342401333799be9aa027e6 and run bash .command.run. It was working but I only got the result for minimap with just one sample. The following step including nanopolish, m6Anet1,2 and postprocessing didn't run through. Could you please help me how to deal with this problem?

Thank you so much for your help!

MaestSi commented 1 year ago

Hi, Does your HPC have a job scheduler? Please check in the conf file that you set the appropriate one (set it to “local” in case you don’t have any job schedulers) and that you mounted the directories where your data are and where you want to store the results dir. Best, SM

kwonej0617 commented 1 year ago

Hi @MaestSi! I run this script on the cluster. This is the job scheduler you mentioned, right?

#! /bin/bash
#BSUB -L /bin/bash
#BSUB -J nf-m6Anet
#BSUB -q large
#BSUB -o LSF/out.HEK293T-WT_KO-rep1
#BSUB -e LSF/err.HEK293T-WT_KO-rep1
#BSUB -n 20 -W 72:00
#BSUB -R span[hosts=1]
#BSUB -R rusage[mem=6000]
#BSUB -u Euijin.kwon@umassmed.edu

module load java/default
module load conda/init
source activate /home/euijin.kwon-umw/Euijin/env/nextflow
../nextflow -c nf-m6anet.conf run nf-m6anet.nf --samples="samples.txt" --resultsDir="data/HEK293T-WT_KO-rep1/." -profile singularity

Sorry to bother you, but could you please let me know how to set the appropriate one? Here is my nf-m6anet.conf.

params{
        // Path to the sample description file
        samples = "samples.txt"

        // Path to a folder where to store results
        resultsDir = "/home/euijin.kwon-umw/Euijin/nf-m6anet/data/HEK293T-WT_KO-rep1"

        // Path to the transcriptome fasta
        transcriptome_fasta = "/home/euijin.kwon-umw/Euijin/xpore/reference/Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.fa"

        // Gtf file
        gtf = "/home/euijin.kwon-umw/Euijin/xpore/reference/Homo_sapiens.GRCh38.91.gtf"

        // Probability modification threshold for calling a site as m6A+
        prob_mod_thr = 0.9

        // Path to post-processing R script
        postprocessingScript = "Transcript_to_genome.R"

        //Path to bulk level m6A estimator script
        bulkLevelScript = "Calculate_m6anet_bulk.R"

        // Flags to select which process to run
        minimap2 = true
        nanopolish = true
        m6anet1 = true
        m6anet2 = true
        postprocessing = true
}

profiles {
        singularity {
                singularity.enabled = true
                singularity.autoMounts = false
                //singularity.cacheDir = "/path/to/singularity/cacheDir" // if commented, work dir is going to be usedd
                process{
                        containerOptions = '--bind /home/:/home'
                        cpus = 1
                        executor = 'pbspro'
                        queue = 'workq'
                        perJobMemLimit = true
                withName:minimap2{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.minimap2 ? 6 : 1 }
                        memory = { params.minimap2 ? 10.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }
                withName:nanopolish{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.nanopolish ? 6 : 1 }
                        memory = { params.nanopolish ? 5.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }
                withName:m6anet1{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.m6anet1 ? 6 : 1 }
                        memory = { params.m6anet1 ? 10.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }
                withName:m6anet2{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.m6anet2 ? 6 : 1 }
                        memory = { params.m6anet2 ? 10.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }
                withName:postprocessing{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.postprocessing ? 6 : 1 }
                        memory = { params.postprocessing ? 10.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }

        }
}

docker {
            docker.enabled = true
            docker.autoMounts = false
            //docker.cacheDir = "/path/to/docker/cacheDir" // if commented, work dir is going to be used
            process{
                        containerOptions = '-v /home/:/home'
                        cpus = 1
                        executor = 'pbspro'
                        queue = 'workq'
                        perJobMemLimit = true
                withName:minimap2{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.minimap2 ? 6 : 1 }
                        memory = { params.minimap2 ? 10.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }
                withName:nanopolish{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.nanopolish ? 6 : 1 }
                        memory = { params.nanopolish ? 10.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }
                withName:m6anet1{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.m6anet1 ? 6 : 1 }
                        memory = { params.m6anet1 ? 10.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }
                withName:m6anet2{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.m6anet2 ? 6 : 1 }
                        memory = { params.m6anet2 ? 10.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }
                withName:postprocessing{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.postprocessing ? 6 : 1 }
                        memory = { params.postprocessing ? 10.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }

        }
    }
}

I really appreciate your help!

MaestSi commented 1 year ago

Hi, first of all, in the profiles -> singularity section, you should replace: executor = 'pbspro' with executor = 'lsf' since you are using bsub to submit jobs. Moreover, keep in mind that you should always provide full paths (also to postprocessingScript and bulkLevelScript and samples), and that command line parameters overwrite parameters stored in the .conf file. So, please, try fixing those in the conf file accordingly, and then run it with:

#! /bin/bash
#BSUB -L /bin/bash
#BSUB -J nf-m6Anet
#BSUB -q large
#BSUB -o LSF/out.HEK293T-WT_KO-rep1
#BSUB -e LSF/err.HEK293T-WT_KO-rep1
#BSUB -n 20 -W 72:00
#BSUB -R span[hosts=1]
#BSUB -R rusage[mem=6000]
#BSUB -u Euijin.kwon@umassmed.edu

module load java/default
module load conda/init
source activate /home/euijin.kwon-umw/Euijin/env/nextflow
../nextflow -c nf-m6anet.conf run nf-m6anet.nf -profile singularity

Were you able to produce a samples file for your samples? Finally, you may consider increasing RAM memory and CPUs for each process, if you have more available. E.g. from:

withName:minimap2{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.minimap2 ? 6 : 1 }
                        memory = { params.minimap2 ? 10.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }

to:

withName:minimap2{
                        container = 'maestsi/nf-m6anet:latest'
                        cpus = { params.minimap2 ? 20 : 1 }
                        memory = { params.minimap2 ? 40.GB + (2.GB * (task.attempt-1)) : 1.GB }
                        errorStrategy = { task.exitStatus == 130 ? 'retry' : 'terminate' }
                        maxRetries = 3
                }

Best, SM

kwonej0617 commented 1 year ago

Thank you @MaestSi for your kind support! I made changes in the configuration file as you suggested and additionally changed 'queue' from workq to long.

process{
            containerOptions = '--bind /home/:/home'
            cpus = 20
            executor = 'lsf'
            queue = 'long'

It seems working well now! Thank you so much!!! By the way, can I get the version information of minimap2, nanopolish, and m6anet in your pipeline? Did you choose the version of each software when you built your singularity? I just wonder if there is a way to choose the version of each software.

Thank you!

Best, EJ

MaestSi commented 1 year ago

Hi, the commands I used for installing all the software are reported in the Dockerfile. As you can see, I did not choose specific software versions in most cases, but you can see which versions were installed by running the singularity image, which was downloaded into singularity_cache_dir (parameter in .conf file) with: singularity run /path/to/maestsi-nf-m6anet-latest.img and then run:

samtools --version | head -n2 -> samtools is v1.15.1
f5c --version -> f5c is v0.7 (f5c is a more efficient implementation of Nanopolish)
minimap2 --version -> minimap2 is v2.24-r1122

If you are not happy with those versions, you should edit the Dockerfile, re-build an image with docker build <path to folder with Dockerfile> -t <image_ID:tag>, and edit the container in the .conf file, but this would require to have Docker installed for building the image. You could probably do something similar with singularity, but I am not familiar with that. Best, SM

MaestSi commented 1 year ago

Ciao, I am going to close the issue. Feel free to reopen it, in case you have any further questions! Best, SM

kwonej0617 commented 1 year ago

@MaestSi Thank you so much for your support!