Closed nbargues closed 4 years ago
Hi @nbargues! Thanks for trying this out. It looks like both the ktcount
and flye
rules have failed. Can you firstly activate the reticulatus conda environment and navigate to your reticulatus/working directory and try manually running:
echo -e 'ktkit_tid ktkit_name mean_seq_len n_seq prop_seq n_seq_unmasked tot_bp prop_bp prop_bp_unmasked' > /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc; ktkit count /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 --dump /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit --rank species >> /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc
Secondly can you show me the flye.log
which should be stored in working/run2bc7.flye25/flye.log
?
Can I just check that barcode7_subtrim_01_L001_R1_001.fq.gz
are actually long reads? That seems to look like an Illumina-generated filename? If those are short reads for pilon
polishing getting passed to flye
, that'll explain the flye
error at least. Maybe you can post your reads.cfg
too?
Thanks for the quick response.
When I tape your cmd, I have the message :
echo -e 'ktkit_tid ktkit_name mean_seq_len n_seq prop_seq n_seq_unmasked tot_bp prop_bp prop_bp_unmasked' > /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc; ktkit count /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 --dump /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit --rank species >> /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc NCBI dump not found in /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit mkdir -p /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit; cd /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit; wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz; tar xvf taxdump.tar.gz
Then the log file from flye give me :
`[2020-02-28 10:11:43] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 10:11:43] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 10:11:43] root: INFO: >>>STAGE: configure [2020-02-28 10:11:43] root: INFO: Configuring run [2020-02-28 10:11:44] root: INFO: Total read length: 15104374 [2020-02-28 10:11:44] root: INFO: Input genome size: 62000000 [2020-02-28 10:11:44] root: INFO: Estimated coverage: 0 [2020-02-28 10:11:44] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 10:11:44] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 10:11:44] root: INFO: Minimum overlap set to 2000 [2020-02-28 10:11:44] root: INFO: Selected k-mer size: 17 [2020-02-28 10:11:44] root: INFO: >>>STAGE: assembly [2020-02-28 10:11:44] root: INFO: Assembling disjointigs [2020-02-28 10:11:44] root: DEBUG: -----Begin assembly log------ [2020-02-28 10:11:44] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 10:11:44] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 10:11:44] DEBUG: Total RAM: 62 Gb [2020-02-28 10:11:44] DEBUG: Available RAM: 60 Gb [2020-02-28 10:11:44] DEBUG: Total CPUs: 24 [2020-02-28 10:11:44] DEBUG: Parameters: [2020-02-28 10:11:44] DEBUG: big_genome_threshold=29000000 [2020-02-28 10:11:44] DEBUG: low_cutoff_warning=1 [2020-02-28 10:11:44] DEBUG: hard_min_coverage_rate=10 [2020-02-28 10:11:44] DEBUG: assemble_kmer_sample=1 [2020-02-28 10:11:44] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 10:11:44] DEBUG: read_align_kmer_sample=1 [2020-02-28 10:11:44] DEBUG: maximum_jump=1500 [2020-02-28 10:11:44] DEBUG: maximum_overhang=1500 [2020-02-28 10:11:44] DEBUG: repeat_kmer_rate=100 [2020-02-28 10:11:44] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 10:11:44] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 10:11:44] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 10:11:44] DEBUG: max_coverage_drop_rate=5 [2020-02-28 10:11:44] DEBUG: chimera_window=100 [2020-02-28 10:11:44] DEBUG: min_reads_in_disjointig=4 [2020-02-28 10:11:44] DEBUG: max_inner_reads=10 [2020-02-28 10:11:44] DEBUG: max_inner_fraction=0.25 [2020-02-28 10:11:44] DEBUG: add_unassembled_reads=0 [2020-02-28 10:11:44] DEBUG: max_separation=500 [2020-02-28 10:11:44] DEBUG: unique_edge_length=50000 [2020-02-28 10:11:44] DEBUG: min_repeat_res_support=0.51 [2020-02-28 10:11:44] DEBUG: out_paths_ratio=5 [2020-02-28 10:11:44] DEBUG: graph_cov_drop_rate=5 [2020-02-28 10:11:44] DEBUG: coverage_estimate_window=100 [2020-02-28 10:11:44] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 10:11:44] DEBUG: min_read_cov_cutoff=3 [2020-02-28 10:11:44] DEBUG: short_tip_length=10000 [2020-02-28 10:11:44] DEBUG: long_tip_length=100000 [2020-02-28 10:11:44] DEBUG: max_bubble_length=50000 [2020-02-28 10:11:44] DEBUG: Running with k-mer size: 17 [2020-02-28 10:11:44] DEBUG: Running with minimum overlap 2000 [2020-02-28 10:11:44] DEBUG: Metagenome mode: Y [2020-02-28 10:11:44] INFO: Reading sequences [2020-02-28 10:11:44] DEBUG: Building positional index [2020-02-28 10:11:44] DEBUG: Total sequence: 15104374 bp [2020-02-28 10:11:44] DEBUG: Expected read coverage: 0 [2020-02-28 10:11:44] INFO: Generating solid k-mer index [2020-02-28 10:11:44] DEBUG: Hard threshold set to 2 [2020-02-28 10:11:44] DEBUG: Started k-mer counting [2020-02-28 10:12:40] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 10:12:40] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 10:12:40] root: INFO: >>>STAGE: configure [2020-02-28 10:12:40] root: INFO: Configuring run [2020-02-28 10:12:40] root: INFO: Total read length: 15104374 [2020-02-28 10:12:40] root: INFO: Input genome size: 62000000 [2020-02-28 10:12:40] root: INFO: Estimated coverage: 0 [2020-02-28 10:12:40] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 10:12:40] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 10:12:40] root: INFO: Minimum overlap set to 2000 [2020-02-28 10:12:40] root: INFO: Selected k-mer size: 17 [2020-02-28 10:12:40] root: INFO: >>>STAGE: assembly [2020-02-28 10:12:40] root: INFO: Assembling disjointigs [2020-02-28 10:12:40] root: DEBUG: -----Begin assembly log------ [2020-02-28 10:12:40] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 10:12:40] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 10:12:40] DEBUG: Total RAM: 62 Gb [2020-02-28 10:12:40] DEBUG: Available RAM: 60 Gb [2020-02-28 10:12:40] DEBUG: Total CPUs: 24 [2020-02-28 10:12:40] DEBUG: Parameters: [2020-02-28 10:12:40] DEBUG: big_genome_threshold=29000000 [2020-02-28 10:12:40] DEBUG: low_cutoff_warning=1 [2020-02-28 10:12:40] DEBUG: hard_min_coverage_rate=10 [2020-02-28 10:12:40] DEBUG: assemble_kmer_sample=1 [2020-02-28 10:12:40] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 10:12:40] DEBUG: read_align_kmer_sample=1 [2020-02-28 10:12:40] DEBUG: maximum_jump=1500 [2020-02-28 10:12:40] DEBUG: maximum_overhang=1500 [2020-02-28 10:12:40] DEBUG: repeat_kmer_rate=100 [2020-02-28 10:12:40] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 10:12:40] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 10:12:40] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 10:12:40] DEBUG: max_coverage_drop_rate=5 [2020-02-28 10:12:40] DEBUG: chimera_window=100 [2020-02-28 10:12:40] DEBUG: min_reads_in_disjointig=4 [2020-02-28 10:12:40] DEBUG: max_inner_reads=10 [2020-02-28 10:12:40] DEBUG: max_inner_fraction=0.25 [2020-02-28 10:12:40] DEBUG: add_unassembled_reads=0 [2020-02-28 10:12:40] DEBUG: max_separation=500 [2020-02-28 10:12:40] DEBUG: unique_edge_length=50000 [2020-02-28 10:12:40] DEBUG: min_repeat_res_support=0.51 [2020-02-28 10:12:40] DEBUG: out_paths_ratio=5 [2020-02-28 10:12:40] DEBUG: graph_cov_drop_rate=5 [2020-02-28 10:12:40] DEBUG: coverage_estimate_window=100 [2020-02-28 10:12:40] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 10:12:40] DEBUG: min_read_cov_cutoff=3 [2020-02-28 10:12:40] DEBUG: short_tip_length=10000 [2020-02-28 10:12:40] DEBUG: long_tip_length=100000 [2020-02-28 10:12:40] DEBUG: max_bubble_length=50000 [2020-02-28 10:12:40] DEBUG: Running with k-mer size: 17 [2020-02-28 10:12:40] DEBUG: Running with minimum overlap 2000 [2020-02-28 10:12:40] DEBUG: Metagenome mode: Y [2020-02-28 10:12:40] INFO: Reading sequences [2020-02-28 10:12:40] DEBUG: Building positional index [2020-02-28 10:12:40] DEBUG: Total sequence: 15104374 bp [2020-02-28 10:12:40] DEBUG: Expected read coverage: 0 [2020-02-28 10:12:40] INFO: Generating solid k-mer index [2020-02-28 10:12:40] DEBUG: Hard threshold set to 2 [2020-02-28 10:12:40] DEBUG: Started k-mer counting [2020-02-28 10:18:49] INFO: Counting k-mers (1/2): [2020-02-28 10:18:51] INFO: Counting k-mers (2/2): [2020-02-28 10:18:53] WARNING: Unable to separate erroneous k-mers from solid k-mers. Possible reasons: (1) Incorrect expected assembly size parameter (2) Highly uneven coverage of the assembly (3) Running with error-corrected reads in raw reads mode Assembly will continue, but results might not be optimal [2020-02-28 10:18:53] DEBUG: Estimated minimum kmer coverage: 2 [2020-02-28 10:18:53] DEBUG: Filtered 0 erroneous k-mers [2020-02-28 10:18:53] DEBUG: Repetitive k-mer frequency: 2446 [2020-02-28 10:18:53] DEBUG: Filtered 1491 repetitive k-mers (0.00280352) [2020-02-28 10:18:53] INFO: Filling index table (1/2) [2020-02-28 10:18:54] INFO: Filling index table (2/2) [2020-02-28 10:18:55] DEBUG: Sorting k-mer index [2020-02-28 10:18:55] DEBUG: Selected k-mers: 10991 [2020-02-28 10:18:55] DEBUG: Index size: 40759 [2020-02-28 10:18:55] DEBUG: Peak RAM usage: 16 Gb [2020-02-28 10:18:55] DEBUG: Estimating k-mer identity bias [2020-02-28 10:18:55] WARNING: No overlaps found - unable to estimate parameters [2020-02-28 10:18:55] DEBUG: Median overlap divergence: 0.5 [2020-02-28 10:18:55] DEBUG: K-mer estimate bias: 0 [2020-02-28 10:18:55] DEBUG: Max divergence threshold set to 0.6 [2020-02-28 10:18:55] INFO: Extending reads [2020-02-28 10:18:55] DEBUG: Estimating overlap coverage [2020-02-28 10:18:56] WARNING: No overlaps found! [2020-02-28 10:18:56] INFO: Overlap-based coverage: 0 [2020-02-28 10:18:56] INFO: Median overlap divergence: 0 [2020-02-28 10:18:56] DEBUG: Sequence divergence distribution:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------------------------------------
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Q25 = 0, Q50 = 0, Q75 = 0
[2020-02-28 10:18:56] INFO: Assembled 0 disjointigs [2020-02-28 10:18:56] INFO: Generating sequence [2020-02-28 10:18:56] DEBUG: Writing FASTA [2020-02-28 10:18:56] DEBUG: Peak RAM usage: 16 Gb -----------End assembly log------------ [2020-02-28 10:18:56] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2020-02-28 10:27:04] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 10:27:04] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 10:27:04] root: INFO: >>>STAGE: configure [2020-02-28 10:27:04] root: INFO: Configuring run [2020-02-28 10:27:04] root: INFO: Total read length: 15104374 [2020-02-28 10:27:04] root: INFO: Input genome size: 62000000 [2020-02-28 10:27:04] root: INFO: Estimated coverage: 0 [2020-02-28 10:27:04] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 10:27:04] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 10:27:04] root: INFO: Minimum overlap set to 2000 [2020-02-28 10:27:04] root: INFO: Selected k-mer size: 17 [2020-02-28 10:27:04] root: INFO: >>>STAGE: assembly [2020-02-28 10:27:04] root: INFO: Assembling disjointigs [2020-02-28 10:27:04] root: DEBUG: -----Begin assembly log------ [2020-02-28 10:27:04] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 10:27:04] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 10:27:04] DEBUG: Total RAM: 62 Gb [2020-02-28 10:27:04] DEBUG: Available RAM: 57 Gb [2020-02-28 10:27:04] DEBUG: Total CPUs: 24 [2020-02-28 10:27:04] DEBUG: Parameters: [2020-02-28 10:27:04] DEBUG: big_genome_threshold=29000000 [2020-02-28 10:27:04] DEBUG: low_cutoff_warning=1 [2020-02-28 10:27:04] DEBUG: hard_min_coverage_rate=10 [2020-02-28 10:27:04] DEBUG: assemble_kmer_sample=1 [2020-02-28 10:27:04] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 10:27:04] DEBUG: read_align_kmer_sample=1 [2020-02-28 10:27:04] DEBUG: maximum_jump=1500 [2020-02-28 10:27:04] DEBUG: maximum_overhang=1500 [2020-02-28 10:27:04] DEBUG: repeat_kmer_rate=100 [2020-02-28 10:27:04] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 10:27:04] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 10:27:04] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 10:27:04] DEBUG: max_coverage_drop_rate=5 [2020-02-28 10:27:04] DEBUG: chimera_window=100 [2020-02-28 10:27:04] DEBUG: min_reads_in_disjointig=4 [2020-02-28 10:27:04] DEBUG: max_inner_reads=10 [2020-02-28 10:27:04] DEBUG: max_inner_fraction=0.25 [2020-02-28 10:27:04] DEBUG: add_unassembled_reads=0 [2020-02-28 10:27:04] DEBUG: max_separation=500 [2020-02-28 10:27:04] DEBUG: unique_edge_length=50000 [2020-02-28 10:27:04] DEBUG: min_repeat_res_support=0.51 [2020-02-28 10:27:04] DEBUG: out_paths_ratio=5 [2020-02-28 10:27:04] DEBUG: graph_cov_drop_rate=5 [2020-02-28 10:27:04] DEBUG: coverage_estimate_window=100 [2020-02-28 10:27:04] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 10:27:04] DEBUG: min_read_cov_cutoff=3 [2020-02-28 10:27:04] DEBUG: short_tip_length=10000 [2020-02-28 10:27:04] DEBUG: long_tip_length=100000 [2020-02-28 10:27:04] DEBUG: max_bubble_length=50000 [2020-02-28 10:27:04] DEBUG: Running with k-mer size: 17 [2020-02-28 10:27:04] DEBUG: Running with minimum overlap 2000 [2020-02-28 10:27:04] DEBUG: Metagenome mode: Y [2020-02-28 10:27:04] INFO: Reading sequences [2020-02-28 10:27:04] DEBUG: Building positional index [2020-02-28 10:27:04] DEBUG: Total sequence: 15104374 bp [2020-02-28 10:27:04] DEBUG: Expected read coverage: 0 [2020-02-28 10:27:04] INFO: Generating solid k-mer index [2020-02-28 10:27:04] DEBUG: Hard threshold set to 2 [2020-02-28 10:27:04] DEBUG: Started k-mer counting [2020-02-28 10:32:55] INFO: Counting k-mers (1/2): [2020-02-28 10:32:57] INFO: Counting k-mers (2/2): [2020-02-28 10:32:59] WARNING: Unable to separate erroneous k-mers from solid k-mers. Possible reasons: (1) Incorrect expected assembly size parameter (2) Highly uneven coverage of the assembly (3) Running with error-corrected reads in raw reads mode Assembly will continue, but results might not be optimal [2020-02-28 10:32:59] DEBUG: Estimated minimum kmer coverage: 2 [2020-02-28 10:32:59] DEBUG: Filtered 0 erroneous k-mers [2020-02-28 10:32:59] DEBUG: Repetitive k-mer frequency: 2446 [2020-02-28 10:32:59] DEBUG: Filtered 1491 repetitive k-mers (0.00280352) [2020-02-28 10:32:59] INFO: Filling index table (1/2) [2020-02-28 10:33:00] INFO: Filling index table (2/2) [2020-02-28 10:33:01] DEBUG: Sorting k-mer index [2020-02-28 10:33:01] DEBUG: Selected k-mers: 10991 [2020-02-28 10:33:01] DEBUG: Index size: 40759 [2020-02-28 10:33:01] DEBUG: Peak RAM usage: 16 Gb [2020-02-28 10:33:01] DEBUG: Estimating k-mer identity bias [2020-02-28 10:33:01] WARNING: No overlaps found - unable to estimate parameters [2020-02-28 10:33:01] DEBUG: Median overlap divergence: 0.5 [2020-02-28 10:33:01] DEBUG: K-mer estimate bias: 0 [2020-02-28 10:33:01] DEBUG: Max divergence threshold set to 0.6 [2020-02-28 10:33:01] INFO: Extending reads [2020-02-28 10:33:01] DEBUG: Estimating overlap coverage [2020-02-28 10:33:02] WARNING: No overlaps found! [2020-02-28 10:33:02] INFO: Overlap-based coverage: 0 [2020-02-28 10:33:02] INFO: Median overlap divergence: 0 [2020-02-28 10:33:02] DEBUG: Sequence divergence distribution:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------------------------------------
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Q25 = 0, Q50 = 0, Q75 = 0
[2020-02-28 10:33:02] INFO: Assembled 0 disjointigs [2020-02-28 10:33:02] INFO: Generating sequence [2020-02-28 10:33:02] DEBUG: Writing FASTA [2020-02-28 10:33:02] DEBUG: Peak RAM usage: 16 Gb -----------End assembly log------------ [2020-02-28 10:33:02] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2020-02-28 12:52:17] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 12:52:17] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 12:52:17] root: INFO: >>>STAGE: configure [2020-02-28 12:52:17] root: INFO: Configuring run [2020-02-28 12:52:17] root: INFO: Total read length: 15104374 [2020-02-28 12:52:17] root: INFO: Input genome size: 62000000 [2020-02-28 12:52:17] root: INFO: Estimated coverage: 0 [2020-02-28 12:52:17] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 12:52:17] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 12:52:17] root: INFO: Minimum overlap set to 2000 [2020-02-28 12:52:17] root: INFO: Selected k-mer size: 17 [2020-02-28 12:52:17] root: INFO: >>>STAGE: assembly [2020-02-28 12:52:17] root: INFO: Assembling disjointigs [2020-02-28 12:52:17] root: DEBUG: -----Begin assembly log------ [2020-02-28 12:52:17] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 12:52:17] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 12:52:17] DEBUG: Total RAM: 62 Gb [2020-02-28 12:52:17] DEBUG: Available RAM: 59 Gb [2020-02-28 12:52:17] DEBUG: Total CPUs: 24 [2020-02-28 12:52:17] DEBUG: Parameters: [2020-02-28 12:52:17] DEBUG: big_genome_threshold=29000000 [2020-02-28 12:52:17] DEBUG: low_cutoff_warning=1 [2020-02-28 12:52:17] DEBUG: hard_min_coverage_rate=10 [2020-02-28 12:52:17] DEBUG: assemble_kmer_sample=1 [2020-02-28 12:52:17] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 12:52:17] DEBUG: read_align_kmer_sample=1 [2020-02-28 12:52:17] DEBUG: maximum_jump=1500 [2020-02-28 12:52:17] DEBUG: maximum_overhang=1500 [2020-02-28 12:52:17] DEBUG: repeat_kmer_rate=100 [2020-02-28 12:52:17] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 12:52:17] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 12:52:17] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 12:52:17] DEBUG: max_coverage_drop_rate=5 [2020-02-28 12:52:17] DEBUG: chimera_window=100 [2020-02-28 12:52:17] DEBUG: min_reads_in_disjointig=4 [2020-02-28 12:52:17] DEBUG: max_inner_reads=10 [2020-02-28 12:52:17] DEBUG: max_inner_fraction=0.25 [2020-02-28 12:52:17] DEBUG: add_unassembled_reads=0 [2020-02-28 12:52:17] DEBUG: max_separation=500 [2020-02-28 12:52:17] DEBUG: unique_edge_length=50000 [2020-02-28 12:52:17] DEBUG: min_repeat_res_support=0.51 [2020-02-28 12:52:17] DEBUG: out_paths_ratio=5 [2020-02-28 12:52:17] DEBUG: graph_cov_drop_rate=5 [2020-02-28 12:52:17] DEBUG: coverage_estimate_window=100 [2020-02-28 12:52:17] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 12:52:17] DEBUG: min_read_cov_cutoff=3 [2020-02-28 12:52:17] DEBUG: short_tip_length=10000 [2020-02-28 12:52:17] DEBUG: long_tip_length=100000 [2020-02-28 12:52:17] DEBUG: max_bubble_length=50000 [2020-02-28 12:52:17] DEBUG: Running with k-mer size: 17 [2020-02-28 12:52:17] DEBUG: Running with minimum overlap 2000 [2020-02-28 12:52:17] DEBUG: Metagenome mode: Y [2020-02-28 12:52:17] INFO: Reading sequences [2020-02-28 12:52:18] DEBUG: Building positional index [2020-02-28 12:52:18] DEBUG: Total sequence: 15104374 bp [2020-02-28 12:52:18] DEBUG: Expected read coverage: 0 [2020-02-28 12:52:18] INFO: Generating solid k-mer index [2020-02-28 12:52:18] DEBUG: Hard threshold set to 2 [2020-02-28 12:52:18] DEBUG: Started k-mer counting [2020-02-28 12:58:02] INFO: Counting k-mers (1/2): [2020-02-28 12:58:04] INFO: Counting k-mers (2/2): [2020-02-28 12:58:06] WARNING: Unable to separate erroneous k-mers from solid k-mers. Possible reasons: (1) Incorrect expected assembly size parameter (2) Highly uneven coverage of the assembly (3) Running with error-corrected reads in raw reads mode Assembly will continue, but results might not be optimal [2020-02-28 12:58:06] DEBUG: Estimated minimum kmer coverage: 2 [2020-02-28 12:58:06] DEBUG: Filtered 0 erroneous k-mers [2020-02-28 12:58:06] DEBUG: Repetitive k-mer frequency: 2446 [2020-02-28 12:58:06] DEBUG: Filtered 1491 repetitive k-mers (0.00280352) [2020-02-28 12:58:06] INFO: Filling index table (1/2) [2020-02-28 12:58:07] INFO: Filling index table (2/2) [2020-02-28 12:58:07] DEBUG: Sorting k-mer index [2020-02-28 12:58:07] DEBUG: Selected k-mers: 10991 [2020-02-28 12:58:07] DEBUG: Index size: 40759 [2020-02-28 12:58:07] DEBUG: Peak RAM usage: 16 Gb [2020-02-28 12:58:07] DEBUG: Estimating k-mer identity bias [2020-02-28 12:58:08] WARNING: No overlaps found - unable to estimate parameters [2020-02-28 12:58:08] DEBUG: Median overlap divergence: 0.5 [2020-02-28 12:58:08] DEBUG: K-mer estimate bias: 0 [2020-02-28 12:58:08] DEBUG: Max divergence threshold set to 0.6 [2020-02-28 12:58:08] INFO: Extending reads [2020-02-28 12:58:08] DEBUG: Estimating overlap coverage [2020-02-28 12:58:08] WARNING: No overlaps found! [2020-02-28 12:58:08] INFO: Overlap-based coverage: 0 [2020-02-28 12:58:08] INFO: Median overlap divergence: 0 [2020-02-28 12:58:08] DEBUG: Sequence divergence distribution:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------------------------------------
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Q25 = 0, Q50 = 0, Q75 = 0
[2020-02-28 12:58:08] INFO: Assembled 0 disjointigs [2020-02-28 12:58:08] INFO: Generating sequence [2020-02-28 12:58:08] DEBUG: Writing FASTA [2020-02-28 12:58:08] DEBUG: Peak RAM usage: 16 Gb -----------End assembly log------------ [2020-02-28 12:58:08] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2020-02-28 12:59:34] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 12:59:34] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 12:59:34] root: INFO: >>>STAGE: configure [2020-02-28 12:59:34] root: INFO: Configuring run [2020-02-28 12:59:35] root: INFO: Total read length: 15104374 [2020-02-28 12:59:35] root: INFO: Input genome size: 62000000 [2020-02-28 12:59:35] root: INFO: Estimated coverage: 0 [2020-02-28 12:59:35] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 12:59:35] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 12:59:35] root: INFO: Minimum overlap set to 2000 [2020-02-28 12:59:35] root: INFO: Selected k-mer size: 17 [2020-02-28 12:59:35] root: INFO: >>>STAGE: assembly [2020-02-28 12:59:35] root: INFO: Assembling disjointigs [2020-02-28 12:59:35] root: DEBUG: -----Begin assembly log------ [2020-02-28 12:59:35] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 12:59:35] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 12:59:35] DEBUG: Total RAM: 62 Gb [2020-02-28 12:59:35] DEBUG: Available RAM: 60 Gb [2020-02-28 12:59:35] DEBUG: Total CPUs: 24 [2020-02-28 12:59:35] DEBUG: Parameters: [2020-02-28 12:59:35] DEBUG: big_genome_threshold=29000000 [2020-02-28 12:59:35] DEBUG: low_cutoff_warning=1 [2020-02-28 12:59:35] DEBUG: hard_min_coverage_rate=10 [2020-02-28 12:59:35] DEBUG: assemble_kmer_sample=1 [2020-02-28 12:59:35] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 12:59:35] DEBUG: read_align_kmer_sample=1 [2020-02-28 12:59:35] DEBUG: maximum_jump=1500 [2020-02-28 12:59:35] DEBUG: maximum_overhang=1500 [2020-02-28 12:59:35] DEBUG: repeat_kmer_rate=100 [2020-02-28 12:59:35] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 12:59:35] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 12:59:35] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 12:59:35] DEBUG: max_coverage_drop_rate=5 [2020-02-28 12:59:35] DEBUG: chimera_window=100 [2020-02-28 12:59:35] DEBUG: min_reads_in_disjointig=4 [2020-02-28 12:59:35] DEBUG: max_inner_reads=10 [2020-02-28 12:59:35] DEBUG: max_inner_fraction=0.25 [2020-02-28 12:59:35] DEBUG: add_unassembled_reads=0 [2020-02-28 12:59:35] DEBUG: max_separation=500 [2020-02-28 12:59:35] DEBUG: unique_edge_length=50000 [2020-02-28 12:59:35] DEBUG: min_repeat_res_support=0.51 [2020-02-28 12:59:35] DEBUG: out_paths_ratio=5 [2020-02-28 12:59:35] DEBUG: graph_cov_drop_rate=5 [2020-02-28 12:59:35] DEBUG: coverage_estimate_window=100 [2020-02-28 12:59:35] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 12:59:35] DEBUG: min_read_cov_cutoff=3 [2020-02-28 12:59:35] DEBUG: short_tip_length=10000 [2020-02-28 12:59:35] DEBUG: long_tip_length=100000 [2020-02-28 12:59:35] DEBUG: max_bubble_length=50000 [2020-02-28 12:59:35] DEBUG: Running with k-mer size: 17 [2020-02-28 12:59:35] DEBUG: Running with minimum overlap 2000 [2020-02-28 12:59:35] DEBUG: Metagenome mode: Y [2020-02-28 12:59:35] INFO: Reading sequences [2020-02-28 12:59:35] DEBUG: Building positional index [2020-02-28 12:59:35] DEBUG: Total sequence: 15104374 bp [2020-02-28 12:59:35] DEBUG: Expected read coverage: 0 [2020-02-28 12:59:35] INFO: Generating solid k-mer index [2020-02-28 12:59:35] DEBUG: Hard threshold set to 2 [2020-02-28 12:59:35] DEBUG: Started k-mer counting [2020-02-28 13:03:22] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 13:03:22] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 13:03:22] root: INFO: >>>STAGE: configure [2020-02-28 13:03:22] root: INFO: Configuring run [2020-02-28 13:03:23] root: INFO: Total read length: 15104374 [2020-02-28 13:03:23] root: INFO: Input genome size: 62000000 [2020-02-28 13:03:23] root: INFO: Estimated coverage: 0 [2020-02-28 13:03:23] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 13:03:23] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 13:03:23] root: INFO: Minimum overlap set to 2000 [2020-02-28 13:03:23] root: INFO: Selected k-mer size: 17 [2020-02-28 13:03:23] root: INFO: >>>STAGE: assembly [2020-02-28 13:03:23] root: INFO: Assembling disjointigs [2020-02-28 13:03:23] root: DEBUG: -----Begin assembly log------ [2020-02-28 13:03:23] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 13:03:23] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 13:03:23] DEBUG: Total RAM: 62 Gb [2020-02-28 13:03:23] DEBUG: Available RAM: 60 Gb [2020-02-28 13:03:23] DEBUG: Total CPUs: 24 [2020-02-28 13:03:23] DEBUG: Parameters: [2020-02-28 13:03:23] DEBUG: big_genome_threshold=29000000 [2020-02-28 13:03:23] DEBUG: low_cutoff_warning=1 [2020-02-28 13:03:23] DEBUG: hard_min_coverage_rate=10 [2020-02-28 13:03:23] DEBUG: assemble_kmer_sample=1 [2020-02-28 13:03:23] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 13:03:23] DEBUG: read_align_kmer_sample=1 [2020-02-28 13:03:23] DEBUG: maximum_jump=1500 [2020-02-28 13:03:23] DEBUG: maximum_overhang=1500 [2020-02-28 13:03:23] DEBUG: repeat_kmer_rate=100 [2020-02-28 13:03:23] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 13:03:23] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 13:03:23] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 13:03:23] DEBUG: max_coverage_drop_rate=5 [2020-02-28 13:03:23] DEBUG: chimera_window=100 [2020-02-28 13:03:23] DEBUG: min_reads_in_disjointig=4 [2020-02-28 13:03:23] DEBUG: max_inner_reads=10 [2020-02-28 13:03:23] DEBUG: max_inner_fraction=0.25 [2020-02-28 13:03:23] DEBUG: add_unassembled_reads=0 [2020-02-28 13:03:23] DEBUG: max_separation=500 [2020-02-28 13:03:23] DEBUG: unique_edge_length=50000 [2020-02-28 13:03:23] DEBUG: min_repeat_res_support=0.51 [2020-02-28 13:03:23] DEBUG: out_paths_ratio=5 [2020-02-28 13:03:23] DEBUG: graph_cov_drop_rate=5 [2020-02-28 13:03:23] DEBUG: coverage_estimate_window=100 [2020-02-28 13:03:23] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 13:03:23] DEBUG: min_read_cov_cutoff=3 [2020-02-28 13:03:23] DEBUG: short_tip_length=10000 [2020-02-28 13:03:23] DEBUG: long_tip_length=100000 [2020-02-28 13:03:23] DEBUG: max_bubble_length=50000 [2020-02-28 13:03:23] DEBUG: Running with k-mer size: 17 [2020-02-28 13:03:23] DEBUG: Running with minimum overlap 2000 [2020-02-28 13:03:23] DEBUG: Metagenome mode: Y [2020-02-28 13:03:23] INFO: Reading sequences [2020-02-28 13:03:23] DEBUG: Building positional index [2020-02-28 13:03:23] DEBUG: Total sequence: 15104374 bp [2020-02-28 13:03:23] DEBUG: Expected read coverage: 0 [2020-02-28 13:03:23] INFO: Generating solid k-mer index [2020-02-28 13:03:23] DEBUG: Hard threshold set to 2 [2020-02-28 13:03:23] DEBUG: Started k-mer counting [2020-02-28 13:03:54] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 13:03:54] root: DEBUG: Cmd: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 [2020-02-28 13:03:54] root: INFO: >>>STAGE: configure [2020-02-28 13:03:54] root: INFO: Configuring run [2020-02-28 13:03:55] root: INFO: Total read length: 15104374 [2020-02-28 13:03:55] root: INFO: Input genome size: 62000000 [2020-02-28 13:03:55] root: INFO: Estimated coverage: 0 [2020-02-28 13:03:55] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 13:03:55] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 13:03:55] root: INFO: Minimum overlap set to 2000 [2020-02-28 13:03:55] root: INFO: Selected k-mer size: 17 [2020-02-28 13:03:55] root: INFO: >>>STAGE: assembly [2020-02-28 13:03:55] root: INFO: Assembling disjointigs [2020-02-28 13:03:55] root: DEBUG: -----Begin assembly log------ [2020-02-28 13:03:55] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/00-assembly/draft_assembly.fasta --genome-size 62000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25/flye.log --threads 8 --meta --min-ovlp 2000 --kmer 17 [2020-02-28 13:03:55] DEBUG: Build date: Feb 28 2020 10:08:24 [2020-02-28 13:03:55] DEBUG: Total RAM: 62 Gb [2020-02-28 13:03:55] DEBUG: Available RAM: 60 Gb [2020-02-28 13:03:55] DEBUG: Total CPUs: 24 [2020-02-28 13:03:55] DEBUG: Parameters: [2020-02-28 13:03:55] DEBUG: big_genome_threshold=29000000 [2020-02-28 13:03:55] DEBUG: low_cutoff_warning=1 [2020-02-28 13:03:55] DEBUG: hard_min_coverage_rate=10 [2020-02-28 13:03:55] DEBUG: assemble_kmer_sample=1 [2020-02-28 13:03:55] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 13:03:55] DEBUG: read_align_kmer_sample=1 [2020-02-28 13:03:55] DEBUG: maximum_jump=1500 [2020-02-28 13:03:55] DEBUG: maximum_overhang=1500 [2020-02-28 13:03:55] DEBUG: repeat_kmer_rate=100 [2020-02-28 13:03:55] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 13:03:55] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 13:03:55] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 13:03:55] DEBUG: max_coverage_drop_rate=5 [2020-02-28 13:03:55] DEBUG: chimera_window=100 [2020-02-28 13:03:55] DEBUG: min_reads_in_disjointig=4 [2020-02-28 13:03:55] DEBUG: max_inner_reads=10 [2020-02-28 13:03:55] DEBUG: max_inner_fraction=0.25 [2020-02-28 13:03:55] DEBUG: add_unassembled_reads=0 [2020-02-28 13:03:55] DEBUG: max_separation=500 [2020-02-28 13:03:55] DEBUG: unique_edge_length=50000 [2020-02-28 13:03:55] DEBUG: min_repeat_res_support=0.51 [2020-02-28 13:03:55] DEBUG: out_paths_ratio=5 [2020-02-28 13:03:55] DEBUG: graph_cov_drop_rate=5 [2020-02-28 13:03:55] DEBUG: coverage_estimate_window=100 [2020-02-28 13:03:55] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 13:03:55] DEBUG: min_read_cov_cutoff=3 [2020-02-28 13:03:55] DEBUG: short_tip_length=10000 [2020-02-28 13:03:55] DEBUG: long_tip_length=100000 [2020-02-28 13:03:55] DEBUG: max_bubble_length=50000 [2020-02-28 13:03:55] DEBUG: Running with k-mer size: 17 [2020-02-28 13:03:55] DEBUG: Running with minimum overlap 2000 [2020-02-28 13:03:55] DEBUG: Metagenome mode: Y [2020-02-28 13:03:55] INFO: Reading sequences [2020-02-28 13:03:55] DEBUG: Building positional index [2020-02-28 13:03:55] DEBUG: Total sequence: 15104374 bp [2020-02-28 13:03:55] DEBUG: Expected read coverage: 0 [2020-02-28 13:03:55] INFO: Generating solid k-mer index [2020-02-28 13:03:55] DEBUG: Hard threshold set to 2 [2020-02-28 13:03:55] DEBUG: Started k-mer counting `
For your second post, the name of the file seems to be Illumina but it's just me that rename that way for my pipeline that use qiime2; but it's really Nanopore data from my Gridion.
Hi @nbargues, thanks for getting back to me with the logs so quickly.
Can you run git rev-parse HEAD
in the reticulatus
directory for me? I just want to check which version you have currently checked out.
It looks like the database that is supposed to automatically be downloaded for ktkit
was not downloaded after all. I'll have a look at why this might have happened. Can you just double check if there is a directory at /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit
, and if there is, what's inside it?
Secondly, it looks like flye
has worked "correctly". The N90 of your reads is reported to be 1.5 Kbp, but flye
chose a minimum overlap of 2000 bases, which is why it assembled 0 disjointigs.
[2020-02-28 10:27:04] root: INFO: Reads N50/N90: 1584 / 1557
[2020-02-28 10:27:04] root: INFO: Minimum overlap set to 2000
A possible solution would be to add the following code to your spellbook.py
, anywhere before the spells
dictionary:
flye25_m1000 = deepcopy(flye25)
flye25_m1000.update({
"m": 1000,
})
This will make a new configuration for flye that will override the minimum overlap parameter (-m
) to 1000 bases. You'll need to then add a single line inside the spells
dictionary to export the spell, like so:
spells = {
[...]
"flye25-m1000" : flye25_m1000,
[...]
}
Finally, in your manifest.cfg
, change flye25
to flye25-m1000
. This is how new spells (configurations) are defined in reticulatus at the moment. It's a bit of a bodge and I want to overhaul this to make it much easier very soon.
Oh also, I've realised from your report that I've left the expected assembly size hard coded in spellbook.py
. I would recommend you change the genome_size
parameter in the master_default
dictionary at the top, to be roughly what you are expecting to assembly. Currently it's hard-coded to 62 Mbp, which I imagine is far bigger than the 16S data you have!
Sorry about that! See #40
My version is : 7628a6a4015ef84dec1be21ca11b88d0e1476556
Regarding ktkit, I dl it outside of your pipeline because I don't have wget access from the ncbi ftp. In the folder I have : citations.dmp delnodes.dmp division.dmp gc.prt gencode.dmp ktkit.ok merged.dmp readme.txt taxdump.tar.gz
I create ktkit.ok myself after untar the gz file.
I made the change that you have proposed and here is the log :
Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Building DAG of jobs... File path benchmarks//home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. File path benchmarks//home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. Using shell: /bin/bash Provided cores: 18 Rules claiming more threads will be scaled down. Job counts: count jobs 2 assembly_read_coverage 1 bandage_assembly 1 bond_summarise_kraken 1 finish 1 flye_assembly 1 install_flye_hash 2 kraken 2 ktkit_rollup 1 link_flye_assembly 4 minimap2_racon_sam 1 polish_medaka 4 polish_racon 1 prep_flye_gfa 1 summarise_assembly_meta 1 summarise_assembly_stats 1 summarise_kraken 1 test_assembly 26
[Fri Feb 28 14:13:06 2020] rule install_flye_hash: output: flye25-m1000.ok jobid: 26 reason: Missing output files: flye25-m1000.ok wildcards: conf=flye25-m1000
Touching output file flye25-m1000.ok. [Fri Feb 28 14:15:33 2020] Finished job 26. 1 of 26 steps (4%) done
[Fri Feb 28 14:15:33 2020] rule flye_assembly: input: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz, flye25-m1000.ok output: run2bc7.flye25-m1000/assembly.fasta, run2bc7.flye25-m1000/assembly_graph.gfa log: log/run2bc7.flye25-m1000_assembly.fa jobid: 18 benchmark: benchmarks/run2bc7.flye25-m1000_assembly.fa reason: Missing output files: run2bc7.flye25-m1000/assembly.fasta, run2bc7.flye25-m1000/assembly_graph.gfa; Input files updated by another job: flye25-m1000.ok wildcards: uuid=run2bc7, conf=flye25-m1000 threads: 8 resources: benchmark=1
Activating conda environment: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/.snakemake/conda/164a79fb [Fri Feb 28 14:16:02 2020] Error in rule flye_assembly: jobid: 18 output: run2bc7.flye25-m1000/assembly.fasta, run2bc7.flye25-m1000/assembly_graph.gfa log: log/run2bc7.flye25-m1000_assembly.fa (check log file(s) for error message) conda-env: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/.snakemake/conda/164a79fb shell: git/flye25-m1000/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 14m -o run2bc7.flye25-m1000/ -t 8 -m 1000 > log/run2bc7.flye25-m1000_assembly.fa 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/.snakemake/log/2020-02-28T141306.159419.snakemake.log
Do you think that ther Warning in "bold" could be the problem ? that the fact that I specify the path /home/blabla instead of home/blabla can be the problem?
@nbargues Thanks! You can ignore those particular warnings; it's because of the way I currently handle operations on the read files themselves. Sorry for the confusion.
Thank you for clarifying the ktkit
problem, I didn't realise you had to do this manually.
It seems the flye
rule still fails. It might be that flye
is not suitable for this particular use-case as the overlaps might be too short. flye
requires a minimum overlap of 1000. Can you show me the run2bc7.flye25-m1000/flye.log
?
Here is the flye log :
[2020-02-28 14:15:35] root: INFO: Starting Flye 2.5-gaf246d6 [2020-02-28 14:15:35] root: DEBUG: Cmd: git/flye25-m1000/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 14m -o run2bc7.flye25-m1000/ -t 8 -m 1000 [2020-02-28 14:15:35] root: INFO: >>>STAGE: configure [2020-02-28 14:15:35] root: INFO: Configuring run [2020-02-28 14:15:35] root: INFO: Total read length: 15104374 [2020-02-28 14:15:35] root: INFO: Input genome size: 14000000 [2020-02-28 14:15:35] root: INFO: Estimated coverage: 1 [2020-02-28 14:15:35] root: WARNING: Expected read coverage is 1, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2020-02-28 14:15:35] root: INFO: Reads N50/N90: 1584 / 1557 [2020-02-28 14:15:35] root: INFO: Selected minimum overlap: 1000 [2020-02-28 14:15:35] root: INFO: Selected k-mer size: 15 [2020-02-28 14:15:35] root: INFO: >>>STAGE: assembly [2020-02-28 14:15:35] root: INFO: Assembling disjointigs [2020-02-28 14:15:35] root: DEBUG: -----Begin assembly log------ [2020-02-28 14:15:35] root: DEBUG: Running: flye-assemble --reads /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --out-asm /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25-m1000/00-assembly/draft_assembly.fasta --genome-size 14000000 --config /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/git/flye25-m1000/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/run2bc7.flye25-m1000/flye.log --threads 8 --meta --min-ovlp 1000 --kmer 15 [2020-02-28 14:15:35] DEBUG: Build date: Feb 28 2020 14:15:17 [2020-02-28 14:15:35] DEBUG: Total RAM: 62 Gb [2020-02-28 14:15:35] DEBUG: Available RAM: 60 Gb [2020-02-28 14:15:35] DEBUG: Total CPUs: 24 [2020-02-28 14:15:35] DEBUG: Parameters: [2020-02-28 14:15:35] DEBUG: big_genome_threshold=29000000 [2020-02-28 14:15:35] DEBUG: low_cutoff_warning=1 [2020-02-28 14:15:35] DEBUG: hard_min_coverage_rate=10 [2020-02-28 14:15:35] DEBUG: assemble_kmer_sample=1 [2020-02-28 14:15:35] DEBUG: repeat_graph_kmer_sample=1 [2020-02-28 14:15:35] DEBUG: read_align_kmer_sample=1 [2020-02-28 14:15:35] DEBUG: maximum_jump=1500 [2020-02-28 14:15:35] DEBUG: maximum_overhang=1500 [2020-02-28 14:15:35] DEBUG: repeat_kmer_rate=100 [2020-02-28 14:15:35] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-02-28 14:15:35] DEBUG: repeat_graph_ovlp_divergence=0.15 [2020-02-28 14:15:35] DEBUG: read_align_ovlp_divergence=0.25 [2020-02-28 14:15:35] DEBUG: max_coverage_drop_rate=5 [2020-02-28 14:15:35] DEBUG: chimera_window=100 [2020-02-28 14:15:35] DEBUG: min_reads_in_disjointig=4 [2020-02-28 14:15:35] DEBUG: max_inner_reads=10 [2020-02-28 14:15:35] DEBUG: max_inner_fraction=0.25 [2020-02-28 14:15:35] DEBUG: add_unassembled_reads=0 [2020-02-28 14:15:35] DEBUG: max_separation=500 [2020-02-28 14:15:35] DEBUG: unique_edge_length=50000 [2020-02-28 14:15:35] DEBUG: min_repeat_res_support=0.51 [2020-02-28 14:15:35] DEBUG: out_paths_ratio=5 [2020-02-28 14:15:35] DEBUG: graph_cov_drop_rate=5 [2020-02-28 14:15:35] DEBUG: coverage_estimate_window=100 [2020-02-28 14:15:35] DEBUG: extend_contigs_with_repeats=1 [2020-02-28 14:15:35] DEBUG: min_read_cov_cutoff=3 [2020-02-28 14:15:35] DEBUG: short_tip_length=10000 [2020-02-28 14:15:35] DEBUG: long_tip_length=100000 [2020-02-28 14:15:35] DEBUG: max_bubble_length=50000 [2020-02-28 14:15:35] DEBUG: Running with k-mer size: 15 [2020-02-28 14:15:35] DEBUG: Running with minimum overlap 1000 [2020-02-28 14:15:35] DEBUG: Metagenome mode: Y [2020-02-28 14:15:35] INFO: Reading sequences [2020-02-28 14:15:35] DEBUG: Building positional index [2020-02-28 14:15:35] DEBUG: Total sequence: 15104374 bp [2020-02-28 14:15:35] DEBUG: Expected read coverage: 1 [2020-02-28 14:15:35] INFO: Generating solid k-mer index [2020-02-28 14:15:35] DEBUG: Hard threshold set to 2 [2020-02-28 14:15:35] DEBUG: Started k-mer counting [2020-02-28 14:15:57] INFO: Counting k-mers (1/2): [2020-02-28 14:15:57] INFO: Counting k-mers (2/2): [2020-02-28 14:15:58] WARNING: Unable to separate erroneous k-mers from solid k-mers. Possible reasons: (1) Incorrect expected assembly size parameter (2) Highly uneven coverage of the assembly (3) Running with error-corrected reads in raw reads mode Assembly will continue, but results might not be optimal [2020-02-28 14:15:58] DEBUG: Estimated minimum kmer coverage: 2 [2020-02-28 14:15:58] DEBUG: Filtered 0 erroneous k-mers [2020-02-28 14:15:58] DEBUG: Repetitive k-mer frequency: 2761 [2020-02-28 14:15:58] DEBUG: Filtered 1490 repetitive k-mers (0.00306702) [2020-02-28 14:15:58] INFO: Filling index table (1/2) [2020-02-28 14:15:59] INFO: Filling index table (2/2) [2020-02-28 14:15:59] DEBUG: Sorting k-mer index [2020-02-28 14:15:59] DEBUG: Selected k-mers: 5862 [2020-02-28 14:15:59] DEBUG: Index size: 19600 [2020-02-28 14:15:59] DEBUG: Peak RAM usage: 1 Gb [2020-02-28 14:15:59] DEBUG: Estimating k-mer identity bias [2020-02-28 14:16:00] DEBUG: Median overlap divergence: 0.198153 [2020-02-28 14:16:00] DEBUG: K-mer estimate bias: -0.0834439 [2020-02-28 14:16:00] DEBUG: Max divergence threshold set to 0.298153 [2020-02-28 14:16:00] INFO: Extending reads [2020-02-28 14:16:00] DEBUG: Estimating overlap coverage [2020-02-28 14:16:01] INFO: Overlap-based coverage: 11 [2020-02-28 14:16:01] INFO: Median overlap divergence: 0.205709 [2020-02-28 14:16:01] DEBUG: Sequence divergence distribution:
| * |
| ** |
| * *** |
| * **** * |
| ** **** * |
| ******* * |
| ******* *** |
| *********** * |
| *********** * |
| ************** |
| ************** |
| ***************** |
| ****************** |
| ******************* |
| ******************** |
| ******************** |
| * ********************* |
| *********************** |
| * *********************** |
| *************************** | *
----------------------------------------------------------------------------------------------------
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Q25 = 0.19, Q50 = 0.21, Q75 = 0.23
[2020-02-28 14:16:02] INFO: Assembled 0 disjointigs [2020-02-28 14:16:02] INFO: Generating sequence [2020-02-28 14:16:02] DEBUG: Writing FASTA [2020-02-28 14:16:02] DEBUG: Peak RAM usage: 1 Gb -----------End assembly log------------ [2020-02-28 14:16:02] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
Thanks @nbargues. It looks to me like flye
isn't suitable for your application, it's working as expected but not assembling any disjointigs. We see similar behaviour on high-coverage viral data.
Reticulatus does have a rule set for wtdbg2
but I haven't tried to use it since I rebased the whole project. You'd be welcome to try it out. I've also been thinking about adding a rule for miniasm
, which might be quite good for your data type. Let me know if that might be of interest to you.
Ok thanks for the response.
But you have made a paper on 16S data from "Zymo" . What is the main difference between my data and Zymo data ? Thanks for the clarification
@nbargues We used reticulatus to assemble full genomes, not 16S sequences.
Oh sorry I didn't go through the whole paper.
Ok so according to you, if I only have full length 16S, your pipeline is not made for such data.
@nbargues Sorry for the confusion, I should have realised this when I saw your read N50. Reticulatus is for long read assembly and polishing of metagenomic shotgun sequencing, where the goal is to assemble as much of the genomes in a mixed sample as possible. Reticulatus is not a metataxonomics 16S analysis pipeline: we're interested in full genomes. I've updated the README to clarify this.
We don't actually do any 16S-based analysis in our paper.
Thanks for the full clarification. If I'm handling those kind of data, I will for sure use your pipeline 👍
Hi Sam, I'm trying to test your pipeline on my 16S ONT data. I'm facing an error and I don't know why. Here is the log file:
Wildcard constraints in inputs are ignored.
Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Wildcard constraints in inputs are ignored. Building DAG of jobs... File path benchmarks//home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. File path benchmarks//home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. Using shell: /bin/bash Provided cores: 18 Rules claiming more threads will be scaled down. Job counts: count jobs 2 assembly_read_coverage 1 bandage_assembly 1 bond_summarise_kraken 1 finish 1 flye_assembly 3 kraken 1 ktkit_count 2 ktkit_rollup 1 link_flye_assembly 4 minimap2_racon_sam 1 polish_medaka 4 polish_racon 1 prep_flye_gfa 1 summarise_assembly_meta 1 summarise_assembly_stats 1 summarise_kraken 1 test_assembly 27
[Fri Feb 28 10:27:02 2020] rule kraken: input: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz, /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/kraken2/k2db.ok output: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2, /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2r jobid: 14 benchmark: benchmarks//home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 reason: Missing output files: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 wildcards: path=/home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz threads: 8 resources: benchmark=1
[Fri Feb 28 10:27:02 2020] rule flye_assembly: input: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz, flye25.ok output: run2bc7.flye25/assembly.fasta, run2bc7.flye25/assembly_graph.gfa log: log/run2bc7.flye25_assembly.fa jobid: 18 benchmark: benchmarks/run2bc7.flye25_assembly.fa reason: Missing output files: run2bc7.flye25/assembly.fasta, run2bc7.flye25/assembly_graph.gfa wildcards: uuid=run2bc7, conf=flye25 threads: 8 resources: benchmark=1
Activating conda environment: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/.snakemake/conda/164a79fb [Fri Feb 28 10:27:36 2020] Finished job 14. 1 of 27 steps (4%) done
[Fri Feb 28 10:27:36 2020] rule ktkit_count: input: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2, /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit/ktkit.ok output: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc jobid: 5 reason: Missing output files: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc; Input files updated by another job: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 wildcards: path=/home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz
[Fri Feb 28 10:27:36 2020] Error in rule ktkit_count: jobid: 5 output: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc shell: echo -e 'ktkit_tid ktkit_name mean_seq_len n_seq prop_seq n_seq_unmasked tot_bp prop_bp prop_bp_unmasked' > /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc; ktkit count /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2 --dump /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/dbs/ktkit --rank species >> /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job ktkit_count since they might be corrupted: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz.k2kc [Fri Feb 28 10:33:02 2020] Error in rule flye_assembly: jobid: 18 output: run2bc7.flye25/assembly.fasta, run2bc7.flye25/assembly_graph.gfa log: log/run2bc7.flye25_assembly.fa (check log file(s) for error message) conda-env: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/working/.snakemake/conda/164a79fb shell: git/flye25/Flye/bin/flye --nano-raw /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/barcode7_subtrim_01_L001_R1_001.fq.gz --meta -g 62m -o run2bc7.flye25/ -t 8 > log/run2bc7.flye25_assembly.fa 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/jftaly/Bureau/nanopore_16s/umi/reticulatus/reticulatus/.snakemake/log/2020-02-28T102701.674949.snakemake.log
Thanks in advance,
Nicolas