SamStudio8 / reticulatus

A snakemake-based pipeline for assembling and polishing long genomes from long nanopore reads
MIT License
68 stars 5 forks source link

Error when running the pipeline #39

Closed nbargues closed 4 years ago

nbargues commented 4 years ago

Hi Sam, I'm trying to test your pipeline on my 16S ONT data. I'm facing an error and I don't know why. Here is the log file:

Wildcard constraints in inputs are ignored.

Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Building DAG of jobs... File path benchmarks//home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. File path benchmarks//home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. Using shell: /bin/bash Provided cores: 18 Rules claiming more threads will be scaled down. Job counts: count jobs 2 assembly_read_coverage 1 bandage_assembly 1 bond_summarise_kraken 1 finish 1 flye_assembly 3 kraken 1 ktkit_count 2 ktkit_rollup 1 link_flye_assembly 4 minimap2_racon_sam 1 polish_medaka 4 polish_racon 1 prep_flye_gfa 1 summarise_assembly_meta 1 summarise_assembly_stats 1 summarise_kraken 1 test_assembly 27

[Fri Feb 28 10:27:02 2020] rule kraken: input: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz, /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/kraken2/k2db.ok output: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2, /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2r jobid: 14 benchmark: benchmarks//home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 reason: Missing output files: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 wildcards: path=/home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz threads: 8 resources: benchmark=1

[Fri Feb 28 10:27:02 2020] rule flye_assembly: input: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz, flye25.ok output: run2bc7.flye25/assembly.fasta, run2bc7.flye25/assembly_graph.gfa log: log/run2bc7.flye25_assembly.fa jobid: 18 benchmark: benchmarks/run2bc7.flye25_assembly.fa reason: Missing output files: run2bc7.flye25/assembly.fasta, run2bc7.flye25/assembly_graph.gfa wildcards: uuid=run2bc7, conf=flye25 threads: 8 resources: benchmark=1

Activating conda environment: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/.snakemake/conda/164a79fb [Fri Feb 28 10:27:36 2020] Finished job 14. 1 of 27 steps (4%) done

[Fri Feb 28 10:27:36 2020] rule ktkit_count: input: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2, /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit/ktkit.ok output: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc jobid: 5 reason: Missing output files: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc; Input files updated by another job: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 wildcards: path=/home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz

[Fri Feb 28 10:27:36 2020] Error in rule ktkit_count: jobid: 5 output: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc shell: echo -e 'ktkit_tid ktkit_name mean_seq_len n_seq prop_seq n_seq_unmasked tot_bp prop_bp prop_bp_unmasked' > /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc; ktkit count /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 --dump /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit --rank species >> /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job ktkit_count since they might be corrupted: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc [Fri Feb 28 10:33:02 2020] Error in rule flye_assembly: jobid: 18 output: run2bc7.flye25/assembly.fasta, run2bc7.flye25/assembly_graph.gfa log: log/run2bc7.flye25_assembly.fa (check log file(s) for error message) conda-env: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/.snakemake/conda/164a79fb shell: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 > log/run2bc7.flye25_assembly.fa 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/.snakemake/log/2020-02-28T102701.674949.snakemake.log

Thanks in advance,

Nicolas

SamStudio8 commented 4 years ago

Hi @nbargues! Thanks for trying this out. It looks like both the ktcount and flye rules have failed. Can you firstly activate the reticulatus conda environment and navigate to your reticulatus/working directory and try manually running:

echo -e 'ktkit_tid ktkit_name mean_seq_len n_seq prop_seq n_seq_unmasked tot_bp prop_bp prop_bp_unmasked' > /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc; ktkit count /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 --dump /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit --rank species >> /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc

Secondly can you show me the flye.log which should be stored in working/run2bc7.flye25/flye.log?

SamStudio8 commented 4 years ago

Can I just check that barcode7_subtrim_01_L001_R1_001.fq.gz are actually long reads? That seems to look like an Illumina-generated filename? If those are short reads for pilon polishing getting passed to flye, that'll explain the flye error at least. Maybe you can post your reads.cfg too?

nbargues commented 4 years ago

Thanks for the quick response.

When I tape your cmd, I have the message :

echo -e 'ktkit_tid ktkit_name mean_seq_len n_seq prop_seq n_seq_unmasked tot_bp prop_bp prop_bp_unmasked' > /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc; ktkit count /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 --dump /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit --rank species >> /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc NCBI dump not found in /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit mkdir -p /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit; cd /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit; wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz; tar xvf taxdump.tar.gz

Then the log file from flye give me :

`[2020-02-28 10:11:43] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 10:11:43] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 10:11:43] root: INFO: >>>STAGE: configure [2020-02-28 10:11:43] root: INFO: Configuring run [2020-02-28 10:11:44] root: INFO: Total read length: 15104374 [2020-02-28 10:11:44] root: INFO: Input genome size: 62000000 [2020-02-28 10:11:44] root: INFO: Estimated coverage: 0 [2020-02-28 10:11:44] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 10:11:44] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 10:11:44] root: INFO: Minimum overlap set to 2000 [2020-02-28 10:11:44] root: INFO: Selected k-mer size: 17 [2020-02-28 10:11:44] root: INFO: >>>STAGE: assembly [2020-02-28 10:11:44] root: INFO: Assembling disjointigs [2020-02-28 10:11:44] root: DEBUG: -----Begin assembly log------ [2020-02-28 10:11:44] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 10:11:44] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 10:11:44] DEBUG: Total RAM: 62 Gb [2020-02-28 10:11:44] DEBUG: Available RAM: 60 Gb [2020-02-28 10:11:44] DEBUG: Total CPUs: 24 [2020-02-28 10:11:44] DEBUG: Parameters: [2020-02-28 10:11:44] DEBUG: big_genome_threshold=29000000 [2020-02-28 10:11:44] DEBUG: low_cutoff_warning=1 [2020-02-28 10:11:44] DEBUG: hard_min_coverage_rate=10 [2020-02-28 10:11:44] DEBUG: assemble_kmer_sample=1 [2020-02-28 10:11:44] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 10:11:44] DEBUG: read_align_kmer_sample=1 [2020-02-28 10:11:44] DEBUG: maximum_jump=1500 [2020-02-28 10:11:44] DEBUG: maximum_overhang=1500 [2020-02-28 10:11:44] DEBUG: repeat_kmer_rate=100 [2020-02-28 10:11:44] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 10:11:44] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 10:11:44] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 10:11:44] DEBUG: max_coverage_drop_rate=5 [2020-02-28 10:11:44] DEBUG: chimera_window=100 [2020-02-28 10:11:44] DEBUG: min_reads_in_disjointig=4 [2020-02-28 10:11:44] DEBUG: max_inner_reads=10 [2020-02-28 10:11:44] DEBUG: max_inner_fraction=0.25 [2020-02-28 10:11:44] DEBUG: add_unassembled_reads=0 [2020-02-28 10:11:44] DEBUG: max_separation=500 [2020-02-28 10:11:44] DEBUG: unique_edge_length=50000 [2020-02-28 10:11:44] DEBUG: min_repeat_res_support=0.51 [2020-02-28 10:11:44] DEBUG: out_paths_ratio=5 [2020-02-28 10:11:44] DEBUG: graph_cov_drop_rate=5 [2020-02-28 10:11:44] DEBUG: coverage_estimate_window=100 [2020-02-28 10:11:44] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 10:11:44] DEBUG: min_read_cov_cutoff=3 [2020-02-28 10:11:44] DEBUG: short_tip_length=10000 [2020-02-28 10:11:44] DEBUG: long_tip_length=100000 [2020-02-28 10:11:44] DEBUG: max_bubble_length=50000 [2020-02-28 10:11:44] DEBUG: Running with k-mer size: 17 [2020-02-28 10:11:44] DEBUG: Running with minimum overlap 2000 [2020-02-28 10:11:44] DEBUG: Metagenome mode: Y [2020-02-28 10:11:44] INFO: Reading sequences [2020-02-28 10:11:44] DEBUG: Building positional index [2020-02-28 10:11:44] DEBUG: Total sequence: 15104374 bp [2020-02-28 10:11:44] DEBUG: Expected read coverage: 0 [2020-02-28 10:11:44] INFO: Generating solid k-mer index [2020-02-28 10:11:44] DEBUG: Hard threshold set to 2 [2020-02-28 10:11:44] DEBUG: Started k-mer counting [2020-02-28 10:12:40] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 10:12:40] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 10:12:40] root: INFO: >>>STAGE: configure [2020-02-28 10:12:40] root: INFO: Configuring run [2020-02-28 10:12:40] root: INFO: Total read length: 15104374 [2020-02-28 10:12:40] root: INFO: Input genome size: 62000000 [2020-02-28 10:12:40] root: INFO: Estimated coverage: 0 [2020-02-28 10:12:40] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 10:12:40] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 10:12:40] root: INFO: Minimum overlap set to 2000 [2020-02-28 10:12:40] root: INFO: Selected k-mer size: 17 [2020-02-28 10:12:40] root: INFO: >>>STAGE: assembly [2020-02-28 10:12:40] root: INFO: Assembling disjointigs [2020-02-28 10:12:40] root: DEBUG: -----Begin assembly log------ [2020-02-28 10:12:40] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 10:12:40] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 10:12:40] DEBUG: Total RAM: 62 Gb [2020-02-28 10:12:40] DEBUG: Available RAM: 60 Gb [2020-02-28 10:12:40] DEBUG: Total CPUs: 24 [2020-02-28 10:12:40] DEBUG: Parameters: [2020-02-28 10:12:40] DEBUG: big_genome_threshold=29000000 [2020-02-28 10:12:40] DEBUG: low_cutoff_warning=1 [2020-02-28 10:12:40] DEBUG: hard_min_coverage_rate=10 [2020-02-28 10:12:40] DEBUG: assemble_kmer_sample=1 [2020-02-28 10:12:40] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 10:12:40] DEBUG: read_align_kmer_sample=1 [2020-02-28 10:12:40] DEBUG: maximum_jump=1500 [2020-02-28 10:12:40] DEBUG: maximum_overhang=1500 [2020-02-28 10:12:40] DEBUG: repeat_kmer_rate=100 [2020-02-28 10:12:40] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 10:12:40] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 10:12:40] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 10:12:40] DEBUG: max_coverage_drop_rate=5 [2020-02-28 10:12:40] DEBUG: chimera_window=100 [2020-02-28 10:12:40] DEBUG: min_reads_in_disjointig=4 [2020-02-28 10:12:40] DEBUG: max_inner_reads=10 [2020-02-28 10:12:40] DEBUG: max_inner_fraction=0.25 [2020-02-28 10:12:40] DEBUG: add_unassembled_reads=0 [2020-02-28 10:12:40] DEBUG: max_separation=500 [2020-02-28 10:12:40] DEBUG: unique_edge_length=50000 [2020-02-28 10:12:40] DEBUG: min_repeat_res_support=0.51 [2020-02-28 10:12:40] DEBUG: out_paths_ratio=5 [2020-02-28 10:12:40] DEBUG: graph_cov_drop_rate=5 [2020-02-28 10:12:40] DEBUG: coverage_estimate_window=100 [2020-02-28 10:12:40] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 10:12:40] DEBUG: min_read_cov_cutoff=3 [2020-02-28 10:12:40] DEBUG: short_tip_length=10000 [2020-02-28 10:12:40] DEBUG: long_tip_length=100000 [2020-02-28 10:12:40] DEBUG: max_bubble_length=50000 [2020-02-28 10:12:40] DEBUG: Running with k-mer size: 17 [2020-02-28 10:12:40] DEBUG: Running with minimum overlap 2000 [2020-02-28 10:12:40] DEBUG: Metagenome mode: Y [2020-02-28 10:12:40] INFO: Reading sequences [2020-02-28 10:12:40] DEBUG: Building positional index [2020-02-28 10:12:40] DEBUG: Total sequence: 15104374 bp [2020-02-28 10:12:40] DEBUG: Expected read coverage: 0 [2020-02-28 10:12:40] INFO: Generating solid k-mer index [2020-02-28 10:12:40] DEBUG: Hard threshold set to 2 [2020-02-28 10:12:40] DEBUG: Started k-mer counting [2020-02-28 10:18:49] INFO: Counting k-mers (1/2): [2020-02-28 10:18:51] INFO: Counting k-mers (2/2): [2020-02-28 10:18:53] WARNING: Unable to separate erroneous k-mers from solid k-mers. Possible reasons: (1) Incorrect expected assembly size parameter (2) Highly uneven coverage of the assembly (3) Running with error-corrected reads in raw reads mode Assembly will continue, but results might not be optimal [2020-02-28 10:18:53] DEBUG: Estimated minimum kmer coverage: 2 [2020-02-28 10:18:53] DEBUG: Filtered 0 erroneous k-mers [2020-02-28 10:18:53] DEBUG: Repetitive k-mer frequency: 2446 [2020-02-28 10:18:53] DEBUG: Filtered 1491 repetitive k-mers (0.00280352) [2020-02-28 10:18:53] INFO: Filling index table (1/2) [2020-02-28 10:18:54] INFO: Filling index table (2/2) [2020-02-28 10:18:55] DEBUG: Sorting k-mer index [2020-02-28 10:18:55] DEBUG: Selected k-mers: 10991 [2020-02-28 10:18:55] DEBUG: Index size: 40759 [2020-02-28 10:18:55] DEBUG: Peak RAM usage: 16 Gb [2020-02-28 10:18:55] DEBUG: Estimating k-mer identity bias [2020-02-28 10:18:55] WARNING: No overlaps found - unable to estimate parameters [2020-02-28 10:18:55] DEBUG: Median overlap divergence: 0.5 [2020-02-28 10:18:55] DEBUG: K-mer estimate bias: 0 [2020-02-28 10:18:55] DEBUG: Max divergence threshold set to 0.6 [2020-02-28 10:18:55] INFO: Extending reads [2020-02-28 10:18:55] DEBUG: Estimating overlap coverage [2020-02-28 10:18:56] WARNING: No overlaps found! [2020-02-28 10:18:56] INFO: Overlap-based coverage: 0 [2020-02-28 10:18:56] INFO: Median overlap divergence: 0 [2020-02-28 10:18:56] DEBUG: Sequence divergence distribution:

|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0, Q50 = 0, Q75 = 0

[2020-02-28 10:18:56] INFO: Assembled 0 disjointigs [2020-02-28 10:18:56] INFO: Generating sequence [2020-02-28 10:18:56] DEBUG: Writing FASTA [2020-02-28 10:18:56] DEBUG: Peak RAM usage: 16 Gb -----------End assembly log------------ [2020-02-28 10:18:56] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2020-02-28 10:27:04] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 10:27:04] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 10:27:04] root: INFO: >>>STAGE: configure [2020-02-28 10:27:04] root: INFO: Configuring run [2020-02-28 10:27:04] root: INFO: Total read length: 15104374 [2020-02-28 10:27:04] root: INFO: Input genome size: 62000000 [2020-02-28 10:27:04] root: INFO: Estimated coverage: 0 [2020-02-28 10:27:04] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 10:27:04] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 10:27:04] root: INFO: Minimum overlap set to 2000 [2020-02-28 10:27:04] root: INFO: Selected k-mer size: 17 [2020-02-28 10:27:04] root: INFO: >>>STAGE: assembly [2020-02-28 10:27:04] root: INFO: Assembling disjointigs [2020-02-28 10:27:04] root: DEBUG: -----Begin assembly log------ [2020-02-28 10:27:04] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 10:27:04] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 10:27:04] DEBUG: Total RAM: 62 Gb [2020-02-28 10:27:04] DEBUG: Available RAM: 57 Gb [2020-02-28 10:27:04] DEBUG: Total CPUs: 24 [2020-02-28 10:27:04] DEBUG: Parameters: [2020-02-28 10:27:04] DEBUG: big_genome_threshold=29000000 [2020-02-28 10:27:04] DEBUG: low_cutoff_warning=1 [2020-02-28 10:27:04] DEBUG: hard_min_coverage_rate=10 [2020-02-28 10:27:04] DEBUG: assemble_kmer_sample=1 [2020-02-28 10:27:04] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 10:27:04] DEBUG: read_align_kmer_sample=1 [2020-02-28 10:27:04] DEBUG: maximum_jump=1500 [2020-02-28 10:27:04] DEBUG: maximum_overhang=1500 [2020-02-28 10:27:04] DEBUG: repeat_kmer_rate=100 [2020-02-28 10:27:04] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 10:27:04] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 10:27:04] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 10:27:04] DEBUG: max_coverage_drop_rate=5 [2020-02-28 10:27:04] DEBUG: chimera_window=100 [2020-02-28 10:27:04] DEBUG: min_reads_in_disjointig=4 [2020-02-28 10:27:04] DEBUG: max_inner_reads=10 [2020-02-28 10:27:04] DEBUG: max_inner_fraction=0.25 [2020-02-28 10:27:04] DEBUG: add_unassembled_reads=0 [2020-02-28 10:27:04] DEBUG: max_separation=500 [2020-02-28 10:27:04] DEBUG: unique_edge_length=50000 [2020-02-28 10:27:04] DEBUG: min_repeat_res_support=0.51 [2020-02-28 10:27:04] DEBUG: out_paths_ratio=5 [2020-02-28 10:27:04] DEBUG: graph_cov_drop_rate=5 [2020-02-28 10:27:04] DEBUG: coverage_estimate_window=100 [2020-02-28 10:27:04] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 10:27:04] DEBUG: min_read_cov_cutoff=3 [2020-02-28 10:27:04] DEBUG: short_tip_length=10000 [2020-02-28 10:27:04] DEBUG: long_tip_length=100000 [2020-02-28 10:27:04] DEBUG: max_bubble_length=50000 [2020-02-28 10:27:04] DEBUG: Running with k-mer size: 17 [2020-02-28 10:27:04] DEBUG: Running with minimum overlap 2000 [2020-02-28 10:27:04] DEBUG: Metagenome mode: Y [2020-02-28 10:27:04] INFO: Reading sequences [2020-02-28 10:27:04] DEBUG: Building positional index [2020-02-28 10:27:04] DEBUG: Total sequence: 15104374 bp [2020-02-28 10:27:04] DEBUG: Expected read coverage: 0 [2020-02-28 10:27:04] INFO: Generating solid k-mer index [2020-02-28 10:27:04] DEBUG: Hard threshold set to 2 [2020-02-28 10:27:04] DEBUG: Started k-mer counting [2020-02-28 10:32:55] INFO: Counting k-mers (1/2): [2020-02-28 10:32:57] INFO: Counting k-mers (2/2): [2020-02-28 10:32:59] WARNING: Unable to separate erroneous k-mers from solid k-mers. Possible reasons: (1) Incorrect expected assembly size parameter (2) Highly uneven coverage of the assembly (3) Running with error-corrected reads in raw reads mode Assembly will continue, but results might not be optimal [2020-02-28 10:32:59] DEBUG: Estimated minimum kmer coverage: 2 [2020-02-28 10:32:59] DEBUG: Filtered 0 erroneous k-mers [2020-02-28 10:32:59] DEBUG: Repetitive k-mer frequency: 2446 [2020-02-28 10:32:59] DEBUG: Filtered 1491 repetitive k-mers (0.00280352) [2020-02-28 10:32:59] INFO: Filling index table (1/2) [2020-02-28 10:33:00] INFO: Filling index table (2/2) [2020-02-28 10:33:01] DEBUG: Sorting k-mer index [2020-02-28 10:33:01] DEBUG: Selected k-mers: 10991 [2020-02-28 10:33:01] DEBUG: Index size: 40759 [2020-02-28 10:33:01] DEBUG: Peak RAM usage: 16 Gb [2020-02-28 10:33:01] DEBUG: Estimating k-mer identity bias [2020-02-28 10:33:01] WARNING: No overlaps found - unable to estimate parameters [2020-02-28 10:33:01] DEBUG: Median overlap divergence: 0.5 [2020-02-28 10:33:01] DEBUG: K-mer estimate bias: 0 [2020-02-28 10:33:01] DEBUG: Max divergence threshold set to 0.6 [2020-02-28 10:33:01] INFO: Extending reads [2020-02-28 10:33:01] DEBUG: Estimating overlap coverage [2020-02-28 10:33:02] WARNING: No overlaps found! [2020-02-28 10:33:02] INFO: Overlap-based coverage: 0 [2020-02-28 10:33:02] INFO: Median overlap divergence: 0 [2020-02-28 10:33:02] DEBUG: Sequence divergence distribution:

|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0, Q50 = 0, Q75 = 0

[2020-02-28 10:33:02] INFO: Assembled 0 disjointigs [2020-02-28 10:33:02] INFO: Generating sequence [2020-02-28 10:33:02] DEBUG: Writing FASTA [2020-02-28 10:33:02] DEBUG: Peak RAM usage: 16 Gb -----------End assembly log------------ [2020-02-28 10:33:02] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2020-02-28 12:52:17] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 12:52:17] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 12:52:17] root: INFO: >>>STAGE: configure [2020-02-28 12:52:17] root: INFO: Configuring run [2020-02-28 12:52:17] root: INFO: Total read length: 15104374 [2020-02-28 12:52:17] root: INFO: Input genome size: 62000000 [2020-02-28 12:52:17] root: INFO: Estimated coverage: 0 [2020-02-28 12:52:17] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 12:52:17] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 12:52:17] root: INFO: Minimum overlap set to 2000 [2020-02-28 12:52:17] root: INFO: Selected k-mer size: 17 [2020-02-28 12:52:17] root: INFO: >>>STAGE: assembly [2020-02-28 12:52:17] root: INFO: Assembling disjointigs [2020-02-28 12:52:17] root: DEBUG: -----Begin assembly log------ [2020-02-28 12:52:17] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 12:52:17] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 12:52:17] DEBUG: Total RAM: 62 Gb [2020-02-28 12:52:17] DEBUG: Available RAM: 59 Gb [2020-02-28 12:52:17] DEBUG: Total CPUs: 24 [2020-02-28 12:52:17] DEBUG: Parameters: [2020-02-28 12:52:17] DEBUG: big_genome_threshold=29000000 [2020-02-28 12:52:17] DEBUG: low_cutoff_warning=1 [2020-02-28 12:52:17] DEBUG: hard_min_coverage_rate=10 [2020-02-28 12:52:17] DEBUG: assemble_kmer_sample=1 [2020-02-28 12:52:17] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 12:52:17] DEBUG: read_align_kmer_sample=1 [2020-02-28 12:52:17] DEBUG: maximum_jump=1500 [2020-02-28 12:52:17] DEBUG: maximum_overhang=1500 [2020-02-28 12:52:17] DEBUG: repeat_kmer_rate=100 [2020-02-28 12:52:17] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 12:52:17] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 12:52:17] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 12:52:17] DEBUG: max_coverage_drop_rate=5 [2020-02-28 12:52:17] DEBUG: chimera_window=100 [2020-02-28 12:52:17] DEBUG: min_reads_in_disjointig=4 [2020-02-28 12:52:17] DEBUG: max_inner_reads=10 [2020-02-28 12:52:17] DEBUG: max_inner_fraction=0.25 [2020-02-28 12:52:17] DEBUG: add_unassembled_reads=0 [2020-02-28 12:52:17] DEBUG: max_separation=500 [2020-02-28 12:52:17] DEBUG: unique_edge_length=50000 [2020-02-28 12:52:17] DEBUG: min_repeat_res_support=0.51 [2020-02-28 12:52:17] DEBUG: out_paths_ratio=5 [2020-02-28 12:52:17] DEBUG: graph_cov_drop_rate=5 [2020-02-28 12:52:17] DEBUG: coverage_estimate_window=100 [2020-02-28 12:52:17] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 12:52:17] DEBUG: min_read_cov_cutoff=3 [2020-02-28 12:52:17] DEBUG: short_tip_length=10000 [2020-02-28 12:52:17] DEBUG: long_tip_length=100000 [2020-02-28 12:52:17] DEBUG: max_bubble_length=50000 [2020-02-28 12:52:17] DEBUG: Running with k-mer size: 17 [2020-02-28 12:52:17] DEBUG: Running with minimum overlap 2000 [2020-02-28 12:52:17] DEBUG: Metagenome mode: Y [2020-02-28 12:52:17] INFO: Reading sequences [2020-02-28 12:52:18] DEBUG: Building positional index [2020-02-28 12:52:18] DEBUG: Total sequence: 15104374 bp [2020-02-28 12:52:18] DEBUG: Expected read coverage: 0 [2020-02-28 12:52:18] INFO: Generating solid k-mer index [2020-02-28 12:52:18] DEBUG: Hard threshold set to 2 [2020-02-28 12:52:18] DEBUG: Started k-mer counting [2020-02-28 12:58:02] INFO: Counting k-mers (1/2): [2020-02-28 12:58:04] INFO: Counting k-mers (2/2): [2020-02-28 12:58:06] WARNING: Unable to separate erroneous k-mers from solid k-mers. Possible reasons: (1) Incorrect expected assembly size parameter (2) Highly uneven coverage of the assembly (3) Running with error-corrected reads in raw reads mode Assembly will continue, but results might not be optimal [2020-02-28 12:58:06] DEBUG: Estimated minimum kmer coverage: 2 [2020-02-28 12:58:06] DEBUG: Filtered 0 erroneous k-mers [2020-02-28 12:58:06] DEBUG: Repetitive k-mer frequency: 2446 [2020-02-28 12:58:06] DEBUG: Filtered 1491 repetitive k-mers (0.00280352) [2020-02-28 12:58:06] INFO: Filling index table (1/2) [2020-02-28 12:58:07] INFO: Filling index table (2/2) [2020-02-28 12:58:07] DEBUG: Sorting k-mer index [2020-02-28 12:58:07] DEBUG: Selected k-mers: 10991 [2020-02-28 12:58:07] DEBUG: Index size: 40759 [2020-02-28 12:58:07] DEBUG: Peak RAM usage: 16 Gb [2020-02-28 12:58:07] DEBUG: Estimating k-mer identity bias [2020-02-28 12:58:08] WARNING: No overlaps found - unable to estimate parameters [2020-02-28 12:58:08] DEBUG: Median overlap divergence: 0.5 [2020-02-28 12:58:08] DEBUG: K-mer estimate bias: 0 [2020-02-28 12:58:08] DEBUG: Max divergence threshold set to 0.6 [2020-02-28 12:58:08] INFO: Extending reads [2020-02-28 12:58:08] DEBUG: Estimating overlap coverage [2020-02-28 12:58:08] WARNING: No overlaps found! [2020-02-28 12:58:08] INFO: Overlap-based coverage: 0 [2020-02-28 12:58:08] INFO: Median overlap divergence: 0 [2020-02-28 12:58:08] DEBUG: Sequence divergence distribution:

|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
|                                                                                                    
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0, Q50 = 0, Q75 = 0

[2020-02-28 12:58:08] INFO: Assembled 0 disjointigs [2020-02-28 12:58:08] INFO: Generating sequence [2020-02-28 12:58:08] DEBUG: Writing FASTA [2020-02-28 12:58:08] DEBUG: Peak RAM usage: 16 Gb -----------End assembly log------------ [2020-02-28 12:58:08] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2020-02-28 12:59:34] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 12:59:34] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 12:59:34] root: INFO: >>>STAGE: configure [2020-02-28 12:59:34] root: INFO: Configuring run [2020-02-28 12:59:35] root: INFO: Total read length: 15104374 [2020-02-28 12:59:35] root: INFO: Input genome size: 62000000 [2020-02-28 12:59:35] root: INFO: Estimated coverage: 0 [2020-02-28 12:59:35] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 12:59:35] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 12:59:35] root: INFO: Minimum overlap set to 2000 [2020-02-28 12:59:35] root: INFO: Selected k-mer size: 17 [2020-02-28 12:59:35] root: INFO: >>>STAGE: assembly [2020-02-28 12:59:35] root: INFO: Assembling disjointigs [2020-02-28 12:59:35] root: DEBUG: -----Begin assembly log------ [2020-02-28 12:59:35] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 12:59:35] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 12:59:35] DEBUG: Total RAM: 62 Gb [2020-02-28 12:59:35] DEBUG: Available RAM: 60 Gb [2020-02-28 12:59:35] DEBUG: Total CPUs: 24 [2020-02-28 12:59:35] DEBUG: Parameters: [2020-02-28 12:59:35] DEBUG: big_genome_threshold=29000000 [2020-02-28 12:59:35] DEBUG: low_cutoff_warning=1 [2020-02-28 12:59:35] DEBUG: hard_min_coverage_rate=10 [2020-02-28 12:59:35] DEBUG: assemble_kmer_sample=1 [2020-02-28 12:59:35] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 12:59:35] DEBUG: read_align_kmer_sample=1 [2020-02-28 12:59:35] DEBUG: maximum_jump=1500 [2020-02-28 12:59:35] DEBUG: maximum_overhang=1500 [2020-02-28 12:59:35] DEBUG: repeat_kmer_rate=100 [2020-02-28 12:59:35] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 12:59:35] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 12:59:35] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 12:59:35] DEBUG: max_coverage_drop_rate=5 [2020-02-28 12:59:35] DEBUG: chimera_window=100 [2020-02-28 12:59:35] DEBUG: min_reads_in_disjointig=4 [2020-02-28 12:59:35] DEBUG: max_inner_reads=10 [2020-02-28 12:59:35] DEBUG: max_inner_fraction=0.25 [2020-02-28 12:59:35] DEBUG: add_unassembled_reads=0 [2020-02-28 12:59:35] DEBUG: max_separation=500 [2020-02-28 12:59:35] DEBUG: unique_edge_length=50000 [2020-02-28 12:59:35] DEBUG: min_repeat_res_support=0.51 [2020-02-28 12:59:35] DEBUG: out_paths_ratio=5 [2020-02-28 12:59:35] DEBUG: graph_cov_drop_rate=5 [2020-02-28 12:59:35] DEBUG: coverage_estimate_window=100 [2020-02-28 12:59:35] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 12:59:35] DEBUG: min_read_cov_cutoff=3 [2020-02-28 12:59:35] DEBUG: short_tip_length=10000 [2020-02-28 12:59:35] DEBUG: long_tip_length=100000 [2020-02-28 12:59:35] DEBUG: max_bubble_length=50000 [2020-02-28 12:59:35] DEBUG: Running with k-mer size: 17 [2020-02-28 12:59:35] DEBUG: Running with minimum overlap 2000 [2020-02-28 12:59:35] DEBUG: Metagenome mode: Y [2020-02-28 12:59:35] INFO: Reading sequences [2020-02-28 12:59:35] DEBUG: Building positional index [2020-02-28 12:59:35] DEBUG: Total sequence: 15104374 bp [2020-02-28 12:59:35] DEBUG: Expected read coverage: 0 [2020-02-28 12:59:35] INFO: Generating solid k-mer index [2020-02-28 12:59:35] DEBUG: Hard threshold set to 2 [2020-02-28 12:59:35] DEBUG: Started k-mer counting [2020-02-28 13:03:22] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 13:03:22] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 13:03:22] root: INFO: >>>STAGE: configure [2020-02-28 13:03:22] root: INFO: Configuring run [2020-02-28 13:03:23] root: INFO: Total read length: 15104374 [2020-02-28 13:03:23] root: INFO: Input genome size: 62000000 [2020-02-28 13:03:23] root: INFO: Estimated coverage: 0 [2020-02-28 13:03:23] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 13:03:23] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 13:03:23] root: INFO: Minimum overlap set to 2000 [2020-02-28 13:03:23] root: INFO: Selected k-mer size: 17 [2020-02-28 13:03:23] root: INFO: >>>STAGE: assembly [2020-02-28 13:03:23] root: INFO: Assembling disjointigs [2020-02-28 13:03:23] root: DEBUG: -----Begin assembly log------ [2020-02-28 13:03:23] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 13:03:23] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 13:03:23] DEBUG: Total RAM: 62 Gb [2020-02-28 13:03:23] DEBUG: Available RAM: 60 Gb [2020-02-28 13:03:23] DEBUG: Total CPUs: 24 [2020-02-28 13:03:23] DEBUG: Parameters: [2020-02-28 13:03:23] DEBUG: big_genome_threshold=29000000 [2020-02-28 13:03:23] DEBUG: low_cutoff_warning=1 [2020-02-28 13:03:23] DEBUG: hard_min_coverage_rate=10 [2020-02-28 13:03:23] DEBUG: assemble_kmer_sample=1 [2020-02-28 13:03:23] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 13:03:23] DEBUG: read_align_kmer_sample=1 [2020-02-28 13:03:23] DEBUG: maximum_jump=1500 [2020-02-28 13:03:23] DEBUG: maximum_overhang=1500 [2020-02-28 13:03:23] DEBUG: repeat_kmer_rate=100 [2020-02-28 13:03:23] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 13:03:23] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 13:03:23] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 13:03:23] DEBUG: max_coverage_drop_rate=5 [2020-02-28 13:03:23] DEBUG: chimera_window=100 [2020-02-28 13:03:23] DEBUG: min_reads_in_disjointig=4 [2020-02-28 13:03:23] DEBUG: max_inner_reads=10 [2020-02-28 13:03:23] DEBUG: max_inner_fraction=0.25 [2020-02-28 13:03:23] DEBUG: add_unassembled_reads=0 [2020-02-28 13:03:23] DEBUG: max_separation=500 [2020-02-28 13:03:23] DEBUG: unique_edge_length=50000 [2020-02-28 13:03:23] DEBUG: min_repeat_res_support=0.51 [2020-02-28 13:03:23] DEBUG: out_paths_ratio=5 [2020-02-28 13:03:23] DEBUG: graph_cov_drop_rate=5 [2020-02-28 13:03:23] DEBUG: coverage_estimate_window=100 [2020-02-28 13:03:23] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 13:03:23] DEBUG: min_read_cov_cutoff=3 [2020-02-28 13:03:23] DEBUG: short_tip_length=10000 [2020-02-28 13:03:23] DEBUG: long_tip_length=100000 [2020-02-28 13:03:23] DEBUG: max_bubble_length=50000 [2020-02-28 13:03:23] DEBUG: Running with k-mer size: 17 [2020-02-28 13:03:23] DEBUG: Running with minimum overlap 2000 [2020-02-28 13:03:23] DEBUG: Metagenome mode: Y [2020-02-28 13:03:23] INFO: Reading sequences [2020-02-28 13:03:23] DEBUG: Building positional index [2020-02-28 13:03:23] DEBUG: Total sequence: 15104374 bp [2020-02-28 13:03:23] DEBUG: Expected read coverage: 0 [2020-02-28 13:03:23] INFO: Generating solid k-mer index [2020-02-28 13:03:23] DEBUG: Hard threshold set to 2 [2020-02-28 13:03:23] DEBUG: Started k-mer counting [2020-02-28 13:03:54] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 13:03:54] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 13:03:54] root: INFO: >>>STAGE: configure [2020-02-28 13:03:54] root: INFO: Configuring run [2020-02-28 13:03:55] root: INFO: Total read length: 15104374 [2020-02-28 13:03:55] root: INFO: Input genome size: 62000000 [2020-02-28 13:03:55] root: INFO: Estimated coverage: 0 [2020-02-28 13:03:55] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 13:03:55] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 13:03:55] root: INFO: Minimum overlap set to 2000 [2020-02-28 13:03:55] root: INFO: Selected k-mer size: 17 [2020-02-28 13:03:55] root: INFO: >>>STAGE: assembly [2020-02-28 13:03:55] root: INFO: Assembling disjointigs [2020-02-28 13:03:55] root: DEBUG: -----Begin assembly log------ [2020-02-28 13:03:55] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 13:03:55] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 13:03:55] DEBUG: Total RAM: 62 Gb [2020-02-28 13:03:55] DEBUG: Available RAM: 60 Gb [2020-02-28 13:03:55] DEBUG: Total CPUs: 24 [2020-02-28 13:03:55] DEBUG: Parameters: [2020-02-28 13:03:55] DEBUG: big_genome_threshold=29000000 [2020-02-28 13:03:55] DEBUG: low_cutoff_warning=1 [2020-02-28 13:03:55] DEBUG: hard_min_coverage_rate=10 [2020-02-28 13:03:55] DEBUG: assemble_kmer_sample=1 [2020-02-28 13:03:55] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 13:03:55] DEBUG: read_align_kmer_sample=1 [2020-02-28 13:03:55] DEBUG: maximum_jump=1500 [2020-02-28 13:03:55] DEBUG: maximum_overhang=1500 [2020-02-28 13:03:55] DEBUG: repeat_kmer_rate=100 [2020-02-28 13:03:55] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 13:03:55] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 13:03:55] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 13:03:55] DEBUG: max_coverage_drop_rate=5 [2020-02-28 13:03:55] DEBUG: chimera_window=100 [2020-02-28 13:03:55] DEBUG: min_reads_in_disjointig=4 [2020-02-28 13:03:55] DEBUG: max_inner_reads=10 [2020-02-28 13:03:55] DEBUG: max_inner_fraction=0.25 [2020-02-28 13:03:55] DEBUG: add_unassembled_reads=0 [2020-02-28 13:03:55] DEBUG: max_separation=500 [2020-02-28 13:03:55] DEBUG: unique_edge_length=50000 [2020-02-28 13:03:55] DEBUG: min_repeat_res_support=0.51 [2020-02-28 13:03:55] DEBUG: out_paths_ratio=5 [2020-02-28 13:03:55] DEBUG: graph_cov_drop_rate=5 [2020-02-28 13:03:55] DEBUG: coverage_estimate_window=100 [2020-02-28 13:03:55] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 13:03:55] DEBUG: min_read_cov_cutoff=3 [2020-02-28 13:03:55] DEBUG: short_tip_length=10000 [2020-02-28 13:03:55] DEBUG: long_tip_length=100000 [2020-02-28 13:03:55] DEBUG: max_bubble_length=50000 [2020-02-28 13:03:55] DEBUG: Running with k-mer size: 17 [2020-02-28 13:03:55] DEBUG: Running with minimum overlap 2000 [2020-02-28 13:03:55] DEBUG: Metagenome mode: Y [2020-02-28 13:03:55] INFO: Reading sequences [2020-02-28 13:03:55] DEBUG: Building positional index [2020-02-28 13:03:55] DEBUG: Total sequence: 15104374 bp [2020-02-28 13:03:55] DEBUG: Expected read coverage: 0 [2020-02-28 13:03:55] INFO: Generating solid k-mer index [2020-02-28 13:03:55] DEBUG: Hard threshold set to 2 [2020-02-28 13:03:55] DEBUG: Started k-mer counting `

For your second post, the name of the file seems to be Illumina but it's just me that rename that way for my pipeline that use qiime2; but it's really Nanopore data from my Gridion.

SamStudio8 commented 4 years ago

Hi @nbargues, thanks for getting back to me with the logs so quickly.

Can you run git rev-parse HEAD in the reticulatus directory for me? I just want to check which version you have currently checked out.

It looks like the database that is supposed to automatically be downloaded for ktkit was not downloaded after all. I'll have a look at why this might have happened. Can you just double check if there is a directory at /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit, and if there is, what's inside it?

Secondly, it looks like flye has worked "correctly". The N90 of your reads is reported to be 1.5 Kbp, but flye chose a minimum overlap of 2000 bases, which is why it assembled 0 disjointigs.

[2020-02-28 10:27:04] root: INFO: Reads N50/N90: 1584 / 1557
[2020-02-28 10:27:04] root: INFO: Minimum overlap set to 2000

A possible solution would be to add the following code to your spellbook.py, anywhere before the spells dictionary:

flye25_m1000 = deepcopy(flye25)
flye25_m1000.update({
    "m": 1000,
})

This will make a new configuration for flye that will override the minimum overlap parameter (-m) to 1000 bases. You'll need to then add a single line inside the spells dictionary to export the spell, like so:

spells = {
[...]
    "flye25-m1000" : flye25_m1000,
[...]
}

Finally, in your manifest.cfg, change flye25 to flye25-m1000. This is how new spells (configurations) are defined in reticulatus at the moment. It's a bit of a bodge and I want to overhaul this to make it much easier very soon.

SamStudio8 commented 4 years ago

Oh also, I've realised from your report that I've left the expected assembly size hard coded in spellbook.py. I would recommend you change the genome_size parameter in the master_default dictionary at the top, to be roughly what you are expecting to assembly. Currently it's hard-coded to 62 Mbp, which I imagine is far bigger than the 16S data you have!

Sorry about that! See #40

nbargues commented 4 years ago

My version is : 7628a6a4015ef84dec1be21ca11b88d0e1476556

Regarding ktkit, I dl it outside of your pipeline because I don't have wget access from the ncbi ftp. In the folder I have : citations.dmp delnodes.dmp division.dmp gc.prt gencode.dmp ktkit.ok merged.dmp readme.txt taxdump.tar.gz

I create ktkit.ok myself after untar the gz file.

I made the change that you have proposed and here is the log :

Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Building DAG of jobs... File path benchmarks//home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. File path benchmarks//home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. Using shell: /bin/bash Provided cores: 18 Rules claiming more threads will be scaled down. Job counts: count jobs 2 assembly_read_coverage 1 bandage_assembly 1 bond_summarise_kraken 1 finish 1 flye_assembly 1 install_flye_hash 2 kraken 2 ktkit_rollup 1 link_flye_assembly 4 minimap2_racon_sam 1 polish_medaka 4 polish_racon 1 prep_flye_gfa 1 summarise_assembly_meta 1 summarise_assembly_stats 1 summarise_kraken 1 test_assembly 26

[Fri Feb 28 14:13:06 2020] rule install_flye_hash: output: flye25-m1000.ok jobid: 26 reason: Missing output files: flye25-m1000.ok wildcards: conf=flye25-m1000

Touching output file flye25-m1000.ok. [Fri Feb 28 14:15:33 2020] Finished job 26. 1 of 26 steps (4%) done

[Fri Feb 28 14:15:33 2020] rule flye_assembly: input: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz, flye25-m1000.ok output: run2bc7.flye25-m1000/assembly.fasta, run2bc7.flye25-m1000/assembly_graph.gfa log: log/run2bc7.flye25-m1000_assembly.fa jobid: 18 benchmark: benchmarks/run2bc7.flye25-m1000_assembly.fa reason: Missing output files: run2bc7.flye25-m1000/assembly.fasta, run2bc7.flye25-m1000/assembly_graph.gfa; Input files updated by another job: flye25-m1000.ok wildcards: uuid=run2bc7, conf=flye25-m1000 threads: 8 resources: benchmark=1

Activating conda environment: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/.snakemake/conda/164a79fb [Fri Feb 28 14:16:02 2020] Error in rule flye_assembly: jobid: 18 output: run2bc7.flye25-m1000/assembly.fasta, run2bc7.flye25-m1000/assembly_graph.gfa log: log/run2bc7.flye25-m1000_assembly.fa (check log file(s) for error message) conda-env: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/.snakemake/conda/164a79fb shell: git/flye25-m1000/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 14m -o run2bc7.flye25-m1000/ -t 8 -m 1000 > log/run2bc7.flye25-m1000_assembly.fa 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/.snakemake/log/2020-02-28T141306.159419.snakemake.log

Do you think that ther Warning in "bold" could be the problem ? that the fact that I specify the path /home/blabla instead of home/blabla can be the problem?

SamStudio8 commented 4 years ago

@nbargues Thanks! You can ignore those particular warnings; it's because of the way I currently handle operations on the read files themselves. Sorry for the confusion.

Thank you for clarifying the ktkit problem, I didn't realise you had to do this manually.

It seems the flye rule still fails. It might be that flye is not suitable for this particular use-case as the overlaps might be too short. flye requires a minimum overlap of 1000. Can you show me the run2bc7.flye25-m1000/flye.log?

nbargues commented 4 years ago

Here is the flye log :

[2020-02-28 14:15:35] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 14:15:35] root: DEBUG: Cmd: git/flye25-m1000/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 14m -o run2bc7.flye25-m1000/ -t 8 -m 1000 [2020-02-28 14:15:35] root: INFO: >>>STAGE: configure [2020-02-28 14:15:35] root: INFO: Configuring run [2020-02-28 14:15:35] root: INFO: Total read length: 15104374 [2020-02-28 14:15:35] root: INFO: Input genome size: 14000000 [2020-02-28 14:15:35] root: INFO: Estimated coverage: 1 [2020-02-28 14:15:35] root: WARNING: Expected read coverage is 1, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 14:15:35] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 14:15:35] root: INFO: Selected minimum overlap: 1000 [2020-02-28 14:15:35] root: INFO: Selected k-mer size: 15 [2020-02-28 14:15:35] root: INFO: >>>STAGE: assembly [2020-02-28 14:15:35] root: INFO: Assembling disjointigs [2020-02-28 14:15:35] root: DEBUG: -----Begin assembly log------ [2020-02-28 14:15:35] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25-m1000/00-assembly/draft_assembly.fasta --genome-size 14000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25-m1000/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25-m1000/flye.log --threads 8 --meta --min-ovlp 1000 --kmer 15 [2020-02-28 14:15:35] DEBUG: Build date: Feb 28 2020 14:15:17 [2020-02-28 14:15:35] DEBUG: Total RAM: 62 Gb [2020-02-28 14:15:35] DEBUG: Available RAM: 60 Gb [2020-02-28 14:15:35] DEBUG: Total CPUs: 24 [2020-02-28 14:15:35] DEBUG: Parameters: [2020-02-28 14:15:35] DEBUG: big_genome_threshold=29000000 [2020-02-28 14:15:35] DEBUG: low_cutoff_warning=1 [2020-02-28 14:15:35] DEBUG: hard_min_coverage_rate=10 [2020-02-28 14:15:35] DEBUG: assemble_kmer_sample=1 [2020-02-28 14:15:35] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 14:15:35] DEBUG: read_align_kmer_sample=1 [2020-02-28 14:15:35] DEBUG: maximum_jump=1500 [2020-02-28 14:15:35] DEBUG: maximum_overhang=1500 [2020-02-28 14:15:35] DEBUG: repeat_kmer_rate=100 [2020-02-28 14:15:35] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 14:15:35] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 14:15:35] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 14:15:35] DEBUG: max_coverage_drop_rate=5 [2020-02-28 14:15:35] DEBUG: chimera_window=100 [2020-02-28 14:15:35] DEBUG: min_reads_in_disjointig=4 [2020-02-28 14:15:35] DEBUG: max_inner_reads=10 [2020-02-28 14:15:35] DEBUG: max_inner_fraction=0.25 [2020-02-28 14:15:35] DEBUG: add_unassembled_reads=0 [2020-02-28 14:15:35] DEBUG: max_separation=500 [2020-02-28 14:15:35] DEBUG: unique_edge_length=50000 [2020-02-28 14:15:35] DEBUG: min_repeat_res_support=0.51 [2020-02-28 14:15:35] DEBUG: out_paths_ratio=5 [2020-02-28 14:15:35] DEBUG: graph_cov_drop_rate=5 [2020-02-28 14:15:35] DEBUG: coverage_estimate_window=100 [2020-02-28 14:15:35] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 14:15:35] DEBUG: min_read_cov_cutoff=3 [2020-02-28 14:15:35] DEBUG: short_tip_length=10000 [2020-02-28 14:15:35] DEBUG: long_tip_length=100000 [2020-02-28 14:15:35] DEBUG: max_bubble_length=50000 [2020-02-28 14:15:35] DEBUG: Running with k-mer size: 15 [2020-02-28 14:15:35] DEBUG: Running with minimum overlap 1000 [2020-02-28 14:15:35] DEBUG: Metagenome mode: Y [2020-02-28 14:15:35] INFO: Reading sequences [2020-02-28 14:15:35] DEBUG: Building positional index [2020-02-28 14:15:35] DEBUG: Total sequence: 15104374 bp [2020-02-28 14:15:35] DEBUG: Expected read coverage: 1 [2020-02-28 14:15:35] INFO: Generating solid k-mer index [2020-02-28 14:15:35] DEBUG: Hard threshold set to 2 [2020-02-28 14:15:35] DEBUG: Started k-mer counting [2020-02-28 14:15:57] INFO: Counting k-mers (1/2): [2020-02-28 14:15:57] INFO: Counting k-mers (2/2): [2020-02-28 14:15:58] WARNING: Unable to separate erroneous k-mers from solid k-mers. Possible reasons: (1) Incorrect expected assembly size parameter (2) Highly uneven coverage of the assembly (3) Running with error-corrected reads in raw reads mode Assembly will continue, but results might not be optimal [2020-02-28 14:15:58] DEBUG: Estimated minimum kmer coverage: 2 [2020-02-28 14:15:58] DEBUG: Filtered 0 erroneous k-mers [2020-02-28 14:15:58] DEBUG: Repetitive k-mer frequency: 2761 [2020-02-28 14:15:58] DEBUG: Filtered 1490 repetitive k-mers (0.00306702) [2020-02-28 14:15:58] INFO: Filling index table (1/2) [2020-02-28 14:15:59] INFO: Filling index table (2/2) [2020-02-28 14:15:59] DEBUG: Sorting k-mer index [2020-02-28 14:15:59] DEBUG: Selected k-mers: 5862 [2020-02-28 14:15:59] DEBUG: Index size: 19600 [2020-02-28 14:15:59] DEBUG: Peak RAM usage: 1 Gb [2020-02-28 14:15:59] DEBUG: Estimating k-mer identity bias [2020-02-28 14:16:00] DEBUG: Median overlap divergence: 0.198153 [2020-02-28 14:16:00] DEBUG: K-mer estimate bias: -0.0834439 [2020-02-28 14:16:00] DEBUG: Max divergence threshold set to 0.298153 [2020-02-28 14:16:00] INFO: Extending reads [2020-02-28 14:16:00] DEBUG: Estimating overlap coverage [2020-02-28 14:16:01] INFO: Overlap-based coverage: 11 [2020-02-28 14:16:01] INFO: Median overlap divergence: 0.205709 [2020-02-28 14:16:01] DEBUG: Sequence divergence distribution:

|                                         *                 |                                        
|                                        **                 |                                        
|                                    *  ***                 |                                        
|                                    *  **** *              |                                        
|                                    ** **** *              |                                        
|                                    ******* *              |                                        
|                                    ******* ***            |                                        
|                                    *********** *          |                                        
|                                    *********** *          |                                        
|                                   **************          |                                        
|                                   **************          |                                        
|                                 *****************         |                                        
|                                 ******************        |                                        
|                                *******************        |                                        
|                                ********************       |                                        
|                                ********************       |                                        
|                             * *********************       |                                        
|                             ***********************       |                                        
|                           * ***********************       |                                        
|                         ***************************       |  *                                     
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.19, Q50 = 0.21, Q75 = 0.23

[2020-02-28 14:16:02] INFO: Assembled 0 disjointigs [2020-02-28 14:16:02] INFO: Generating sequence [2020-02-28 14:16:02] DEBUG: Writing FASTA [2020-02-28 14:16:02] DEBUG: Peak RAM usage: 1 Gb -----------End assembly log------------ [2020-02-28 14:16:02] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

SamStudio8 commented 4 years ago

Thanks @nbargues. It looks to me like flye isn't suitable for your application, it's working as expected but not assembling any disjointigs. We see similar behaviour on high-coverage viral data.

Reticulatus does have a rule set for wtdbg2 but I haven't tried to use it since I rebased the whole project. You'd be welcome to try it out. I've also been thinking about adding a rule for miniasm, which might be quite good for your data type. Let me know if that might be of interest to you.

nbargues commented 4 years ago

Ok thanks for the response.

But you have made a paper on 16S data from "Zymo" . What is the main difference between my data and Zymo data ? Thanks for the clarification

SamStudio8 commented 4 years ago

@nbargues We used reticulatus to assemble full genomes, not 16S sequences.

nbargues commented 4 years ago

Oh sorry I didn't go through the whole paper.

Ok so according to you, if I only have full length 16S, your pipeline is not made for such data.

SamStudio8 commented 4 years ago

@nbargues Sorry for the confusion, I should have realised this when I saw your read N50. Reticulatus is for long read assembly and polishing of metagenomic shotgun sequencing, where the goal is to assemble as much of the genomes in a mixed sample as possible. Reticulatus is not a metataxonomics 16S analysis pipeline: we're interested in full genomes. I've updated the README to clarify this.

We don't actually do any 16S-based analysis in our paper.

nbargues commented 4 years ago

Thanks for the full clarification. If I'm handling those kind of data, I will for sure use your pipeline 👍