BeatsonLab-MicrobialGenomics / micropipe

A pipeline for high-quality bacterial genome construction using ONT sequencing
GNU General Public License v3.0
38 stars 9 forks source link

error at assembly (flye step) #9

Open paulgarias opened 2 years ago

paulgarias commented 2 years ago

I am working with a student who is having this issue with their execution of nextflow


executor >  local (3)
[30/a60146] process > assembly:porechop (H37Rv.1) [100%] 1 of 1 ✔
[24/79a658] process > assembly:japsa (H37Rv.1)    [100%] 1 of 1 ✔
[dd/860979] process > assembly:flye (H37Rv.1)     [  0%] 0 of 1
[-        ] process > assembly:racon_cpu          -
[-        ] process > assembly:medaka_cpu         -
[-        ] process > assembly:nextpolish         -
[-        ] process > assembly:fixstart           -
[-        ] process > assembly:quast              -
Error executing process > 'assembly:flye (H37Rv.1)'

Caused by:
  Missing output file(s) `assembly.fasta` expected by process `assembly:flye (H37Rv.1)`

Command executed:

  set +eu
  flye --nano-raw filtered.fastq.gz --genome-size 5.0m --threads 4 --out-dir $PWD --plasmids
  flye -v 2> flye_version.txt

Command exit status:
  0

Command output:
  (empty)

Command error:
  WARNING: Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  [2022-07-22 17:41:29] INFO: Starting Flye 2.5-release
  [2022-07-22 17:41:29] INFO: >>>STAGE: configure
  [2022-07-22 17:41:29] INFO: Configuring run
  [2022-07-22 17:43:47] INFO: Total read length: 5089510998
  [2022-07-22 17:43:47] INFO: Input genome size: 5000000
  [2022-07-22 17:43:47] INFO: Estimated coverage: 1017
  [2022-07-22 17:43:47] WARNING: Expected read coverage is 1017, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly?
  [2022-07-22 17:43:47] INFO: Reads N50/N90: 9733 / 2679
  [2022-07-22 17:43:47] INFO: Minimum overlap set to 3000
  [2022-07-22 17:43:47] INFO: Selected k-mer size: 15
  [2022-07-22 17:43:47] INFO: >>>STAGE: assembly
  [2022-07-22 17:43:47] INFO: Assembling disjointigs
  [2022-07-22 17:43:47] INFO: Reading sequences
  [2022-07-22 17:45:15] INFO: Generating solid k-mer index
  [2022-07-22 17:45:32] INFO: Counting k-mers (1/2):
  0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
  [2022-07-22 17:48:26] INFO: Counting k-mers (2/2):
  0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
  [2022-07-22 17:54:34] INFO: Filling index table
  0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
  [2022-07-22 18:05:38] INFO: Extending reads
  [2022-07-22 18:24:23] INFO: Overlap-based coverage: 868
  [2022-07-22 18:24:23] INFO: Median overlap divergence: 0.0852075
  0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
  [2022-07-24 03:32:08] INFO: Assembled 0 disjointigs
  [2022-07-24 03:32:08] INFO: Generating sequence
  [2022-07-24 03:32:09] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

Work dir:
  /projectsp/alland/PanGenome_Project/ReviewerResponses/testing_pipelines/work/dd/8609795cae4b8d69393b8e7daee1bf

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Looking for some guidance on how to proceed.

Best, Paul

vmurigneu commented 2 years ago

Hi Paul,

This is a Flye error that seems to be linked to a high read coverage (1017) which could confuse Flye, it is similar to this issue: https://github.com/fenderglass/Flye/issues/128

Here are some suggestions from the author of Flye: I suggest to try two more runs (i) metagenome mode (ii) normal mode with --asm-coverage 50 to use the longest 50x reads for disjointig assembly.

Cam you try to rerun after modifying the nextflow.config file line 90 to reduce the coverage for initial disjointig assembly: flye_args = "--plasmids" => flye_args = "--plasmids --asm-coverage 50" or using the metagenome mode : flye_args = "--plasmids" => flye_args = "--plasmids --meta"

Hope this helps. Valentine

maddne commented 2 years ago

HI, I keep having a similar issue. I don't receive the Warning: Expected read coverage is 1017 as Paul. It seems that the assembly:flye step is expecting assembly.fasta as an input file, but in the directory of this file, there is draft_assembly.fasta. Do you think this might cause the problem? I am attaching flye log and nextflow log flye.log .nextflow.log .

vmurigneu commented 2 years ago

@maddne No the assembly.fasta is an output file of the Flye step. The error is here:
OSError: [Errno 30] Read-only file system

Work dir: /home/bio/micropipe/micropipe/work/36/fa4934df1db68825ea7799ce4a2f88

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh Aug-03 16:58:48.309 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Missing output file(s) assembly.fasta expected by process assembly:flye (Eco1948)

Can you check the content of .command.sh and .command.run inside the work dir?

maddne commented 2 years ago

@vmurigneu yes I can check the content them. They seem to appear hidden and I cannot upload them directly, so I copied the content into a txt file error.txt .

vmurigneu commented 2 years ago

@maddne have you checked that you have permission to write in the output folder? Did the previous steps of the pipeline generated expected output (trimming, filtering)?

Can you send the command line used and content of nextfow.config please

maddne commented 2 years ago

I assumed that I had problems with permission however I changed the permission to the output folder to drwxrwxrwx, but this wasn't the case, because it was able to write files there. The previous steps Trimming and filtering worked like a charm and produced output and HTML report. Here are some reports: Eco1948_porechop.log trace.txt nextflow_report.txt s

Here is my command nextflow main.nf --samplesheet /home/bio/micropipe/micropipe/samples5.csv --fastq /home/bio/micropipe/micropipe/bact5/ --outdir /home/bio/micropipe/micropipe/results123/ --datadir /home/bio/micropipe/micropipe/bact5/

nextflow.config file was downloaded as in your repo and I only changed the cache folder for singularity at line 3

vmurigneu commented 2 years ago

@maddne can you post the nextflow.config file line 3 please?

would you please be able to post the .command.run inside the work dir /home/bio/micropipe/micropipe/work/36/fa4934df1db68825ea7799ce4a2f88