BeatsonLab-MicrobialGenomics / micropipe

A pipeline for high-quality bacterial genome construction using ONT sequencing
GNU General Public License v3.0
38 stars 9 forks source link

Flye not creating assembly file #5

Closed vasquini closed 2 years ago

vasquini commented 2 years ago

I am trying to run micropipe assembly-only. This is my sample sheet: (base) [suj7@login02 ~]$ head sample0.txt barcode_id,sample_id,long_fastq,genome_size barcode13,barcode13,demux_guppy_fastq/barcode13/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m barcode13,barcode13,demux_guppy_fastq/barcode13/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m barcode14,barcode14,demux_guppy_fastq/barcode14/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m barcode14,barcode14,demux_guppy_fastq/barcode14/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m barcode15,barcode15,demux_guppy_fastq/barcode15/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m barcode15,barcode15,demux_guppy_fastq/barcode15/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_2.fastq,5m

I have attached an error file here myerror.txt I know I used nano-hq instead of the original nano-raw, but it doesn't work any better with nano-raw or nano-corr.

vasquini commented 2 years ago

I am wondering if this is linked to a symbolic link being created to filtered.fastq.gz file? I ran the flye 2.5 version successfully outside of Micropipe with the filtered.fastq.gz file outputted from porechop and japsa (The filtering and trimming processes are successful).

vmurigneu commented 2 years ago

hi @vasquini

yes the --nano-hq mode is new to Flye v2.9

Can you share the content of the file flye_version.txt? From the error message, it looks like you might not be using Flye v2.9 but a previous version flye: error: one of the arguments --pacbio-raw --pacbio-corr --nano-raw --nano-corr --subassemblies is required

mmfacun commented 2 years ago

Hi. I also have some error re flye assembly. my flye version is 2.5

executor >  local (3)
[2a/f05baf] process > assembly:porechop (barcode09) [100%] 1 of 1 ✔
[24/6a3c02] process > assembly:japsa (barcode09)    [100%] 1 of 1 ✔
[e3/863f23] process > assembly:flye (barcode09)     [100%] 1 of 1, failed: 1 ✘
[-        ] process > assembly:racon_cpu            -
[-        ] process > assembly:medaka_cpu           -
[-        ] process > assembly:nextpolish           -
[-        ] process > assembly:fixstart             -
[-        ] process > assembly:quast                -
Error executing process > 'assembly:flye (barcode09)'

Caused by:
  Missing output file(s) `assembly.fasta` expected by process `assembly:flye (barcode09)`

Command executed:

  set +eu
  flye --nano-raw filtered.fastq.gz --genome-size 5.9m --threads 4 --out-dir $PWD true
  flye -v 2> flye_version.txt

Command exit status:
  0

Command output:
  (empty)

Command error:
  WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
  WARNING: Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  usage: flye (--pacbio-raw | --pacbio-corr | --nano-raw |
  p: you     --nano-corr | --subassemblies) file1 [file_2 ...]
  mand.s     --genome-size SIZE --out-dir PATH
             [--threads int] [--iterations int] [--min-overlap int]
             [--meta] [--plasmids] [--no-trestle] [--polish-target]
             [--debug] [--version] [--help] [--resume]
             [--resume-from] [--stop-after]
  flye: error: unrecognized arguments: true

Work dir:
  /home/madel/work/e3/863f232121bc7582a38bc9cdb673eb`
mmfacun commented 2 years ago

I tried remove true from .command.sh in the specified work dir above.. then added -resume to my original command, but it got same error and the work dir it specified changed.

so i manually changed to work dir it specified in the error using cd removed true again from .command.sh and entered bash .command.run

flye worked until computing consensus when it got the ff error.

please also help on how i can -resume the pipeline. thanks

[2022-02-22 05:10:50] INFO: >>>STAGE: consensus
[2022-02-22 05:10:50] INFO: Running Minimap2
[2022-02-22 05:10:57] INFO: Computing consensus
Traceback (most recent call last):
  File "/usr/local/bin/flye", line 31, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/site-packages/flye/main.py", line 760, in main
    _run(args)
  File "/usr/local/lib/python2.7/site-packages/flye/main.py", line 562, in _run
    jobs[i].run()
  File "/usr/local/lib/python2.7/site-packages/flye/main.py", line 321, in run
    self.args.platform)
  File "/usr/local/lib/python2.7/site-packages/flye/polishing/consensus.py", line 58, in get_consensus
    use_secondary=True)
  File "/usr/local/lib/python2.7/site-packages/flye/polishing/alignment.py", line 143, in __init__
    self.lock = multiprocessing.Lock()
  File "/usr/local/lib/python2.7/multiprocessing/__init__.py", line 176, in Lock
    return Lock()
  File "/usr/local/lib/python2.7/multiprocessing/synchronize.py", line 147, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1)
  File "/usr/local/lib/python2.7/multiprocessing/synchronize.py", line 75, in __init__
    sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 30] Read-only file system
vmurigneu commented 2 years ago

Hi @mmfacun

Can you please share the nextflow command line you're using and check the content of this parameter in the nextflow.config file:

flye_args = "--plasmids"

Thanks

mmfacun commented 2 years ago

Hi. flye_args = "--plasmids" is present in the nextflow.config file as is

below is my code nextflow micropipe/main.nf --samplesheet /mnt/d/minion/GHRU-K/sample.csv --fastq /mnt/d/minion/fastq_trimmed_porechop/ --datadir /. --outdir /mnt/d/minion/GHRU-K/micropipe --flye_args "--plasmids" --skip_porechop --max_memory 5.GB

vmurigneu commented 2 years ago

Can you try to remove --flye_args "--plasmids" from the nextflow command line? It is not useful anyway as it is the default. I am not sure why but this parameter is not being interpreted correctly in the flye command line, i.e. replaced by 'true', and causing the error as not recognised as a valid flye parameter.

mmfacun commented 2 years ago

I removed --flye_args "--plasmids" from my command line. but i got OSError: [Errno 30] Read-only file system this time. please check out attached log from work dir

error.txt

vmurigneu commented 2 years ago

Are you resuming the pipeline or starting from scratch? I dont think nextflow can correctly handle resuming the pipeline after manually modifying the flye command line

mmfacun commented 2 years ago

When i did not resume the pipeline, i got the same error. I also tried changing my output dir, but still same OS error. below is my nextflow log.

nextflow.log

vmurigneu commented 2 years ago

@mmfacun Were you able to run the assembly workflow with the test data without error? Could it be an issue with singularity @thomcuddihy might be able to help

thomcuddihy commented 2 years ago

@mmfacun Looking at the command you gave, --datadir /. would resolve to the root level of your filesystem. As singularity needs to bind mount to the various specified directories on your native filesystem, so it can access them inside its read-only filesystem, trying to bind mount at the root level will be a Bad Time.

Would you please mind trying again with --datadir ./ (current working directory, if that's what you intend)?

mmfacun commented 2 years ago

@vmurigneu test data also had the same OSError. @thomcuddihy even if i specifically put --datadir as /mnt/d/minion/GHRU-K/ it still produced same OSerror

thomcuddihy commented 2 years ago

@mmfacun would you please be able to post the .command.run folder inside the work dir?

mmfacun commented 2 years ago

hi. please see here command.run.txt

thomcuddihy commented 2 years ago

@mmfacun on line 278 of the .command.run file you uploaded, it has the singularity exec ... command that launches the container, and needs to have a bind mount (-B /path/to/whatever) to each location that needs reading/writing.

As per line 176 of the default nextflow.config it should include the fast5, fastq, and datadir directories as determined by the mounts block. You can see by the quadruple space on the previously mentioned line 278 of your .command.run that those mounts variables are blank, which would only occur if the default values are used (lines 56-58 of nextflow.config), meaning that they weren't overridden by the nextflow main.nf ... command (e.g. nextflow main.nf --basecalling --demultiplexing --samplesheet /path/to/samples.csv --fast5 /path/to/fast5/directory/ --datadir /path/to/datadir/ --outdir /path/to/outdir/)

Would you please ensure that your nextflow.config is the same as the repo file, especially from below line 137?

After that, if you include the --fastq, --fast5, and --datadir in the nextflow main.nf ... command, then those directories should be bind mounted (-B ...) in the singularity exec .. command in command.run and therefore writable within singularity.

Finally, please try running micropipe inside the the micropipe directory (i.e. nextflow main.nf ... instead of the nextflow micropipe/main.nf ... you listed in your past post), just in case there is an issue with hierarchy.