fmalmeida / MpGAP

Multi-platform genome assembly pipeline for Illumina, Nanopore and PacBio reads
https://mpgap.readthedocs.io/en/latest/
GNU General Public License v3.0
53 stars 10 forks source link

NOTE: Process `HYBRID:strategy_2_pilon (aspergillus_terreus:strategy_2)` terminated with an error exit status (137) -- Execution is retried (1) #77

Open sajjadasaf opened 1 month ago

sajjadasaf commented 1 month ago
(base) omic@omic-Precision-7920-Tower:~/funig8$ nextflow run fmalmeida/mpgap   --output _ASSEMBLY   --max_cpus 5   --skip_spades   --input "samplesheet.yml"   --unicycler_additional_parameters ' --mode conservative '   -profile docker

curl: (28) SSL connection timeout

 N E X T F L O W   ~  version 24.04.2

Launching `https://github.com/fmalmeida/mpgap` [happy_meninsky] DSL2 - revision: 9f2475ff11 [master]

------------------------------------------------------
  fmalmeida/mpgap v3.2
------------------------------------------------------
Core Nextflow options
  revision                       : master
  runName                        : happy_meninsky
  containerEngine                : docker
  container                      : [.*:fmalmeida/mpgap@sha256:d0c421d2caa6bfb6fbaad36b4182746485f750c82524b7b738b0d190505c8098]
  launchDir                      : /home/omic/funig8
  workDir                        : /home/omic/funig8/work
  projectDir                     : /home/omic/.nextflow/assets/fmalmeida/mpgap
  userName                       : omic
  profile                        : docker
  configFiles                    : /home/omic/.nextflow/assets/fmalmeida/mpgap/nextflow.config

Input/output options
  input                          : samplesheet.yml
  output                         : _ASSEMBLY

Computational options
  start_asm_mem                  : 20 GB
  max_cpus                       : 5
  max_memory                     : 40 GB

Turn assemblers and modules on/off
  skip_spades                    : true

Software' additional parameters
  unicycler_additional_parameters:  --mode conservative

Generic options
  tracedir                       : _ASSEMBLY/pipeline_info

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use fmalmeida/mpgap for your analysis please cite:

* The pipeline
  https://doi.org/10.12688/f1000research.139488.1

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/fmalmeida/mpgap#citation
------------------------------------------------------

    Launching defined workflows!
    By default, all workflows will appear in the console "log" message.
    However, the processes of each workflow will be launched based on the inputs received.
    You can see that processes that were not launched have an empty [-       ].

[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:unicycler      -
executor >  local (2)
[-        ] SHORTREADS_ONLY:unicycler      -
executor >  local (7)
[-        ] SHORTREADS_ONLY:unicycler      -
executor >  local (7)
executor >  local (7)
executor >  local (8)
executor >  local (11)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
executor >  local (14)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
executor >  local (14)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
executor >  local (17)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
executor >  local (17)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
executor >  local (18)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
[-        ] SHORTREADS_ONLY:megahit        -
[-        ] LONGREADS_ONLY:canu            -
executor >  local (19)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
[-        ] SHORTREADS_ONLY:megahit        -
[-        ] LONGREADS_ONLY:canu            -
[-        ] LONGREADS_ONLY:flye            -
executor >  local (19)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
[-        ] SHORTREADS_ONLY:megahit        -
[-        ] LONGREADS_ONLY:canu            -
[-        ] LONGREADS_ONLY:flye            -
executor >  local (22)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
[-        ] SHORTREADS_ONLY:megahit        -
executor >  local (23)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
[-        ] SHORTREADS_ONLY:megahit        -
[-        ] LONGREADS_ONLY:canu            -
executor >  local (24)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
[-        ] SHORTREADS_ONLY:megahit        -
[-        ] LONGREADS_ONLY:canu            -
[-        ] LONGREADS_ONLY:flye            -
executor >  local (25)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
[-        ] SHORTREADS_ONLY:megahit        -
[-        ] LONGREADS_ONLY:canu            -
[-        ] LONGREADS_ONLY:flye            -
executor >  local (26)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
[-        ] SHORTREADS_ONLY:megahit        -
[-        ] LONGREADS_ONLY:canu            -
[-        ] LONGREADS_ONLY:flye            -
executor >  local (27)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
[-        ] SHORTREADS_ONLY:megahit        -
[-        ] LONGREADS_ONLY:canu            -
[-        ] LONGREADS_ONLY:flye            -
[-        ] LONGREADS_ONLY:unicycler       -
[-        ] LONGREADS_ONLY:raven           -
[-        ] LONGREADS_ONLY:shasta          -
[-        ] LONGREADS_ONLY:wtdbg2          -
[-        ] LONGREADS_ONLY:hifiasm         -
[-        ] LONGREADS_ONLY:medaka          -
[-        ] LONGREADS_ONLY:nanopolish      -
[-        ] LONGREADS_ONLY:gcpp            -
[a2/3c8f43] HYB…gillus_terreus:strategy_1) | 0 of 1
[e7/cdbb77] HYB…gillus_terreus:strategy_1) | 1 of 1 ✔
[64/054208] HYB…gillus_terreus:strategy_2) | 0 of 1
[5f/24fa65] HYB…gillus_terreus:strategy_2) | 1 of 1 ✔
[68/c6a255] HYB…gillus_terreus:strategy_2) | 1 of 1 ✔
[3c/83e89b] HYB…gillus_terreus:strategy_2) | 1 of 1 ✔
[-        ] HYBRID:strategy_2_shasta       -
[-        ] HYBRID:strategy_2_hifiasm      -
[8b/7f6c0e] HYB…gillus_terreus:strategy_2) | 1 of 1 ✔
[44/55e4cd] HYB…gillus_terreus:strategy_2) | 4 of 8, failed: 4, retries: 4
[36/4da361] HYB…gillus_terreus:strategy_2) | 3 of 7, failed: 3, retries: 3
[29/653896] ASS…gillus_terreus:strategy_2) | 0 of 5
Plus 5 more processes waiting for tasks…
[1a/5c45fa] NOTE: Process `HYBRID:strategy_2_pilon (aspergillus_terreus:strategy_2)` terminated with an error exit status (137) -- Execution is retried (1)
fmalmeida commented 1 month ago

Hi @sajjadasaf , It seems you ran out of memory. Can you attach here the .nextflow.log file and the .command.? files of this process so I can check?

Then I can propose something if that is the case.

sajjadasaf commented 1 month ago

nextflow.log

How much memory it needed? Where I can find .command. file?

fmalmeida commented 1 month ago

The amount of memory necessary, is dependent on the amount of data given as input and genome size. For example, for a bacteria with 5.5m I can generally run with no more than 10.Gb. But, it always depends on amount of data and genome size.

In any case, I just noted this: string: Execution is retried (1)

Did the pipeline shut down, or still running? I expect with this message that the pipeline should still try once with the max allowed memory.

If it is still running, let's wait to check if it works.

If the pipeline indeed shuts down, please attach the logs I mentioned, the main pipeline .nextflow.log file and the .command.? files of the specific process that shuts the pipeline down.

By the error, it seems to me that it only requires more memory than the first try.

fmalmeida commented 1 month ago

nextflow.log

How much memory it needed? Where I can find .command. file?

I just checked the log. It seems the pipeline is still running. Let's first see if it works on not on the automatic second try. If not, we can assess the process logs to check what is the error.

The .command.? files are placed in the working directories. The working directories one can find in the .nextflow.log. For example, here:

image
sajjadasaf commented 1 month ago

Thank you for your kind reply. Yes the pipeline still running and trying. It is fungi data with 35m genome.

fmalmeida commented 1 month ago

Thank you for your kind reply. Yes the pipeline still running and trying. It is fungi data with 35m genome.

Perfect. Let's wait a bit more then. If it indeed fail shutting down incomplete, we can collect all the logs. Then I can take a look at it!

Thanks for using it.

sajjadasaf commented 3 weeks ago

You were right it was a memory error. I increased the memory and the process almost complete but at the end it show the following error.

executor >  local (2)
[-        ] SHORTREADS_ONLY:unicycler      -
[-        ] SHORTREADS_ONLY:shovill        -
[-        ] SHORTREADS_ONLY:megahit        -
[-        ] LONGREADS_ONLY:canu            -
[-        ] LONGREADS_ONLY:flye            -
[-        ] LONGREADS_ONLY:unicycler       -
[-        ] LONGREADS_ONLY:raven           -
[-        ] LONGREADS_ONLY:shasta          -
[-        ] LONGREADS_ONLY:wtdbg2          -
[-        ] LONGREADS_ONLY:hifiasm         -
[-        ] LONGREADS_ONLY:medaka          -
[-        ] LONGREADS_ONLY:nanopolish      -
[-        ] LONGREADS_ONLY:gcpp            -
[21/3016b3] HYB…gillus_terreus:strategy_1) | 1 of 1, cached: 1 ✔
[03/badcd2] HYB…gillus_terreus:strategy_1) | 1 of 1, cached: 1 ✔
[6c/0227b1] HYB…gillus_terreus:strategy_2) | 1 of 1, cached: 1 ✔
[2d/84db3d] HYB…gillus_terreus:strategy_2) | 1 of 1, cached: 1 ✔
[55/c38b23] HYB…gillus_terreus:strategy_2) | 1 of 1, cached: 1 ✔
[19/746669] HYB…gillus_terreus:strategy_2) | 1 of 1, cached: 1 ✔
[-        ] HYBRID:strategy_2_shasta       -
[b4/85bcc0] HYB…gillus_terreus:strategy_2) | 1 of 1, cached: 1 ✔
[80/2e77ed] HYB…gillus_terreus:strategy_2) | 5 of 5, cached: 5 ✔
[5e/d425c6] HYB…gillus_terreus:strategy_2) | 5 of 5, cached: 5 ✔
[34/fa7243] ASS…gillus_terreus:strategy_1) | 17 of 17, cached: 16, failed: 1 ✘
[26/9b7334] ASS…M_DUMPSOFTWAREVERSIONS (1) | 1 of 1 ✔
[-        ] ASSEMBLY_QC:multiqc            | 0 of 1
Plus 4 more processes waiting for tasks…
Execution cancelled -- Finishing pending tasks before exit
Pipeline completed at: 2024-06-07T14:08:33.269330969+04:00
Execution status: failed
Execution duration: 41m 43s
Do not give up, we can fix it!
ERROR ~ Error executing process > 'ASSEMBLY_QC:quast (aspergillus_terreus:strategy_1)'

Caused by:
  Process `ASSEMBLY_QC:quast (aspergillus_terreus:strategy_1)` terminated with an error exit status (1)

Command executed:

  # run quast
  quast.py \
      -o unicycler \
      -t 4 \
      --pacbio Genomic-DNA-8_LGE6249-PacBio.hifi_reads.fastq \
      --pe1 Genomic-DNA-8-LGE6249_L4_1.fq --pe2 Genomic-DNA-8-LGE6249_L4_2.fq \
       \
      --rna-finding \
      --min-contig 100 \
       \
      unicycler_assembly.fasta

  # run busco
  cp -r /opt/busco_db .
  busco \
    --tar \
    --download_path ./ \
    -i unicycler_assembly.fasta \
    -m genome \
    -l ./busco_db/bacteria_odb10 \
    -o unicycler/busco_stats/run_unicycler

  # change names
  for i in $( find unicycler/busco_stats/run_unicycler -name 'short*.json' ) ; do
    path=$( dirname $i ) ;
    mv $i ${path}/short_summary_unicycler.json ;
  done
  for i in $( find unicycler/busco_stats/run_unicycler -name 'short*.txt' ) ; do
    path=$( dirname $i ) ;
    mv $i ${path}/short_summary_unicycler.txt ;
  done

  # save assembly
  mkdir -p input_assembly
  cp unicycler_assembly.fasta input_assembly/unicycler_assembly.fasta

  # get version
  cat <<-END_VERSIONS > versions.yml
  "ASSEMBLY_QC:quast":
      quast: $( quast.py --version | tail -n+2 | cut -f 2 -d ' ' )
      busco: $( busco --version | cut -f 2 -d ' ' )
  END_VERSIONS

Command exit status:
  1

Command output:

  2024-06-07 10:08:19
  Running Basic statistics processor...
    Contig files: 
      unicycler_assembly
    Calculating N50 and L50...
      unicycler_assembly, N50 = 1902357, L50 = 8, auN = 1720046.8, Total length = 33445707, GC % = 52.29, # N's per 100 kbp =  0.00
    Drawing Nx plot...
      saved to unicycler/basic_stats/Nx_plot.pdf
    Drawing cumulative plot...
      saved to unicycler/basic_stats/cumulative_plot.pdf
    Drawing GC content plot...
      saved to unicycler/basic_stats/GC_content_plot.pdf
    Drawing unicycler_assembly GC content plot...
      saved to unicycler/basic_stats/unicycler_assembly_GC_content_plot.pdf
  Done.

  NOTICE: Genes are not predicted by default. Use --gene-finding or --glimmer option to enable it.

  2024-06-07 10:08:21
  Running Barrnap...
  Logging to unicycler/predicted_genes/barrnap.log...
      Ribosomal RNA genes = 25
      Predicted genes (GFF): unicycler/predicted_genes/unicycler_assembly.rna.gff
  Done.

  2024-06-07 10:08:28
  Creating large visual summaries...
  This may take a while: press Ctrl-C to skip this step..
    1 of 2: Creating PDF with all tables and plots...
    2 of 2: Creating Icarus viewers...
  Done

  2024-06-07 10:08:29
  RESULTS:
    Text versions of total report are saved to unicycler/report.txt, report.tsv, and report.tex
    Text versions of transposed total report are saved to unicycler/transposed_report.txt, transposed_report.tsv, and transposed_report.tex
    HTML version (interactive tables and plots) is saved to unicycler/report.html
    PDF version (tables and plots) is saved to unicycler/report.pdf
    Icarus (contig browser) is saved to unicycler/icarus.html
    Log is saved to unicycler/quast.log

  Finished: 2024-06-07 10:08:29
  Elapsed time: 0:41:28.089135
  NOTICEs: 5; WARNINGs: 2; non-fatal ERRORs: 0

  Thank you for using QUAST!
  2024-06-07 10:08:32 INFO: ***** Start a BUSCO v5.6.1 analysis, current time: 06/07/2024 10:08:32 *****
  2024-06-07 10:08:32 INFO: Configuring BUSCO with local environment
  2024-06-07 10:08:32 INFO: Running genome mode

Command error:
    Calculating N50 and L50...
      unicycler_assembly, N50 = 1902357, L50 = 8, auN = 1720046.8, Total length = 33445707, GC % = 52.29, # N's per 100 kbp =  0.00
    Drawing Nx plot...
      saved to unicycler/basic_stats/Nx_plot.pdf
    Drawing cumulative plot...
      saved to unicycler/basic_stats/cumulative_plot.pdf
    Drawing GC content plot...
      saved to unicycler/basic_stats/GC_content_plot.pdf
    Drawing unicycler_assembly GC content plot...
      saved to unicycler/basic_stats/unicycler_assembly_GC_content_plot.pdf
  Done.

  NOTICE: Genes are not predicted by default. Use --gene-finding or --glimmer option to enable it.

  2024-06-07 10:08:21
  Running Barrnap...
  Logging to unicycler/predicted_genes/barrnap.log...
      Ribosomal RNA genes = 25
      Predicted genes (GFF): unicycler/predicted_genes/unicycler_assembly.rna.gff
  Done.

  2024-06-07 10:08:28
  Creating large visual summaries...
  This may take a while: press Ctrl-C to skip this step..
    1 of 2: Creating PDF with all tables and plots...
    2 of 2: Creating Icarus viewers...
  Done

  2024-06-07 10:08:29
  RESULTS:
    Text versions of total report are saved to unicycler/report.txt, report.tsv, and report.tex
    Text versions of transposed total report are saved to unicycler/transposed_report.txt, transposed_report.tsv, and transposed_report.tex
    HTML version (interactive tables and plots) is saved to unicycler/report.html
    PDF version (tables and plots) is saved to unicycler/report.pdf
    Icarus (contig browser) is saved to unicycler/icarus.html
    Log is saved to unicycler/quast.log

  Finished: 2024-06-07 10:08:29
  Elapsed time: 0:41:28.089135
  NOTICEs: 5; WARNINGs: 2; non-fatal ERRORs: 0

  Thank you for using QUAST!
  2024-06-07 10:08:32 INFO: ***** Start a BUSCO v5.6.1 analysis, current time: 06/07/2024 10:08:32 *****
  2024-06-07 10:08:32 INFO: Configuring BUSCO with local environment
  2024-06-07 10:08:32 INFO: Running genome mode
  2024-06-07 10:08:32 ERROR:    A run with the name unicycler/busco_stats/run_unicycler already exists...
    If you are sure you wish to overwrite existing files, please use the -f (force) option
  2024-06-07 10:08:32 ERROR:    BUSCO analysis failed!
  2024-06-07 10:08:32 ERROR:    Check the logs, read the user guide (https://busco.ezlab.org/busco_userguide.html), and check the BUSCO issue board on https://gitlab.com/ezlab/busco/issues

Work dir:
  /home/omic/funig8/work/34/fa7243238dbef8437ae5534af56df3

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
fmalmeida commented 3 weeks ago

Hi @sajjadasaf , I think if I just add a '-f' param to BUSCO and Quast might solve this issue. I am making a new branch with this feature. Can you try to see if it works?

You have to include the -r issue-77 -latest and -resume param to, respectively, use the correct source branch, and make nextflow try to load all successful processes, starting only the failed ones. -latest makes nextflow download the latest version of selected code.

E.g. nextflow run fmalmeida/mpgap run -r issue-77 -latest -resume ... < all your other params >.

Let me know what happens, if it works, I can make a hotfix release.