HKU-BAL / ClusterV

ClusterV: finding HIV quasispecies and drug resistance from ONT sequencing data
BSD 3-Clause "New" or "Revised" License
10 stars 0 forks source link

Flye cannot assemble a contig for high coverage amplicon data #1

Closed mbdabrowska1 closed 1 year ago

mbdabrowska1 commented 1 year ago

Hi, I'm running your tool for my amplicon ONT HIV data, but the pipeline fails at the Flye step:

[ ** STEP 3 ** ]get consensus and HIVDB report

/opt/bin/ClusterV/cv
NC_001802.1 1552 4810
CMD: mkdir -p /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus
checking subtype 1
CMD: flye --nano-raw /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_ori_r.fasta --threads 8 --out-dir /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_flye -m 1000 -g 5k >/dev/null 2>&1
1
Traceback (most recent call last):
  File "/opt/bin/ClusterV/cv.py", line 68, in <module>
    main()
  File "/opt/bin/ClusterV/cv.py", line 64, in main
    submodule.main()
  File "/opt/bin/ClusterV/cv/ClusterV.py", line 107, in main
    run_get_consensus(args)
  File "/opt/bin/ClusterV/cv/get_consensus.py", line 314, in run_get_consensus
    _run_command(cmd)
  File "/opt/bin/ClusterV/shared/utils.py", line 50, in _run_command
    stderr = result.stderr

When I ran the Flye on its own to check what the log and the reason for failing I got the following error:

[2023-11-20 15:16:39] root: INFO: >>>STAGE: contigger
[2023-11-20 15:16:39] root: INFO: Generating contigs
[2023-11-20 15:16:39] root: DEBUG: -----Begin contigger analyser log------
[2023-11-20 15:16:39] root: DEBUG: Running: flye-modules contigger --graph-edges /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/flye_test/20-repeat/repeat_graph_edges.fasta --reads /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_ori_r.fasta --out-dir /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/flye_test/30-contigger --config /opt/conda/envs/clusterV/lib/python3.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg --repeat-graph /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/flye_test/20-repeat/repeat_graph_dump --graph-aln /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/flye_test/20-repeat/read_alignment_dump --log /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/flye_test/flye.log --threads 8 --min-ovlp 1000
[2023-11-20 15:16:39] DEBUG: Build date: Feb 22 2022 03:24:00
[2023-11-20 15:16:39] DEBUG: Total RAM: 1007 Gb
[2023-11-20 15:16:39] DEBUG: Available RAM: 900 Gb
[2023-11-20 15:16:39] DEBUG: Total CPUs: 64
[2023-11-20 15:16:39] DEBUG: Loading /opt/conda/envs/clusterV/lib/python3.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg
[2023-11-20 15:16:39] DEBUG: Loading /opt/conda/envs/clusterV/lib/python3.7/site-packages/flye/config/bin_cfg/asm_defaults.cfg
[2023-11-20 15:16:39] DEBUG:    big_genome_threshold=29000000
[2023-11-20 15:16:39] DEBUG:    meta_read_filter_kmer_freq=100
[2023-11-20 15:16:39] DEBUG:    chain_large_gap_penalty=2
[2023-11-20 15:16:39] DEBUG:    chain_small_gap_penalty=0.5
[2023-11-20 15:16:39] DEBUG:    chain_gap_jump_threshold=100
[2023-11-20 15:16:39] DEBUG:    max_coverage_drop_rate=5
[2023-11-20 15:16:39] DEBUG:    max_extensions_drop_rate=5
[2023-11-20 15:16:39] DEBUG:    chimera_window=100
[2023-11-20 15:16:39] DEBUG:    chimera_overhang=1000
[2023-11-20 15:16:39] DEBUG:    min_reads_in_disjointig=4
[2023-11-20 15:16:39] DEBUG:    max_inner_reads=10
[2023-11-20 15:16:39] DEBUG:    max_inner_fraction=0.25
[2023-11-20 15:16:39] DEBUG:    max_separation=500
[2023-11-20 15:16:39] DEBUG:    unique_edge_length=50000
[2023-11-20 15:16:39] DEBUG:    min_repeat_res_support=0.51
[2023-11-20 15:16:39] DEBUG:    out_paths_ratio=5
[2023-11-20 15:16:39] DEBUG:    graph_cov_drop_rate=5
[2023-11-20 15:16:39] DEBUG:    coverage_estimate_window=100
[2023-11-20 15:16:39] DEBUG:    max_bubble_length=50000
[2023-11-20 15:16:39] DEBUG:    loop_coverage_rate=1.5
[2023-11-20 15:16:39] DEBUG:    repeat_edge_cov_mult=1.75
[2023-11-20 15:16:39] DEBUG:    weak_detach_rate=5
[2023-11-20 15:16:39] DEBUG:    tip_coverage_rate=2
[2023-11-20 15:16:39] DEBUG:    tip_length_rate=2
[2023-11-20 15:16:39] DEBUG:    output_gfa_before_rr=0
[2023-11-20 15:16:39] DEBUG:    low_cutoff_warning=1
[2023-11-20 15:16:39] DEBUG:    kmer_size=17
[2023-11-20 15:16:39] DEBUG:    use_minimizers=0
[2023-11-20 15:16:39] DEBUG:    reads_base_alignment=0
[2023-11-20 15:16:39] DEBUG:    meta_read_top_kmer_rate=0.40
[2023-11-20 15:16:39] DEBUG:    maximum_jump=1500
[2023-11-20 15:16:39] DEBUG:    maximum_overhang=1500
[2023-11-20 15:16:39] DEBUG:    repeat_kmer_rate=100
[2023-11-20 15:16:39] DEBUG:    assemble_ovlp_divergence=0.10
[2023-11-20 15:16:39] DEBUG:    assemble_divergence_relative=1
[2023-11-20 15:16:39] DEBUG:    repeat_graph_ovlp_divergence=0.08
[2023-11-20 15:16:39] DEBUG:    read_align_ovlp_divergence=0.25
[2023-11-20 15:16:39] DEBUG:    hpc_scoring_on=0
[2023-11-20 15:16:39] DEBUG:    add_unassembled_reads=0
[2023-11-20 15:16:39] DEBUG:    extend_contigs_with_repeats=0
[2023-11-20 15:16:39] DEBUG:    min_read_cov_cutoff=3
[2023-11-20 15:16:39] DEBUG:    short_tip_length=20000
[2023-11-20 15:16:39] DEBUG:    long_tip_length=100000
[2023-11-20 15:16:39] DEBUG: Running with k-mer size: 17
[2023-11-20 15:16:39] DEBUG: Selected minimum overlap 1000
[2023-11-20 15:16:39] INFO: Reading sequences
[2023-11-20 15:16:39] DEBUG: Building positional index
[2023-11-20 15:16:39] DEBUG: Total sequence: 1821017 bp
[2023-11-20 15:16:39] DEBUG: Flipped 0
[2023-11-20 15:16:39] DEBUG: Final graph contains 0 egdes
[2023-11-20 15:16:39] DEBUG: Extending contigs into repeats
[2023-11-20 15:16:39] DEBUG: Covered 0 repetitive contigs
[2023-11-20 15:16:39] INFO: Generated 0 contigs
[2023-11-20 15:16:39] DEBUG: Writing FASTA
[2023-11-20 15:16:39] DEBUG: Generating scaffold connections
[2023-11-20 15:16:39] INFO: Added 0 scaffold connections
[2023-11-20 15:16:39] DEBUG: Writing Dot
[2023-11-20 15:16:39] DEBUG: Writing FASTA
[2023-11-20 15:16:39] DEBUG: Writing Gfa
[2023-11-20 15:16:39] DEBUG: Peak RAM usage: 0 Gb
-----------End assembly log------------
[2023-11-20 15:16:39] root: ERROR: No contigs were assembled - pipeline stopped
[2023-11-20 15:16:39] root: ERROR: Pipeline aborted

Attaching the full flye log: flye.log

Any help would be greatly appreciated!

sujunhao commented 1 year ago

Hi,

By default, we have set the assembled HIV size to around 5k, which exceeds your current bed setting of 3k.

flye --nano-raw /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_ori_r.fasta --threads 8 --out-dir /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_flye -m 1000 -g 5k >/dev/null 2>&1

To analyze data with a smaller amplicon size, can you please try to run the following to test whether Flye can run without error?

flye --nano-raw /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_ori_r.fasta --threads 8 --out-dir /mnt/parscratch/users/md1mbdx/HIV_mehmet_13072023/clusterv_test/results/consensus/barcode87_1/barcode87_1_flye -m 500 -g 2.5k >/dev/null 2>&1

If this command executes without any issues, you can rerun your analysis by incorporating the following option into your pipeline:

python cv.py ClusterV ... \
--flye_genome_size 2.5k --flye_genome_size_olp 500

If not, could you please share your bam with me? I will check the problem in my local environment.

JH

mbdabrowska1 commented 1 year ago

Hi, I ran it as suggested but still the same issue. What email address would you like me to send the bam file to?

sujunhao commented 1 year ago

Update to v1.2 and solve the problem.

https://github.com/HKU-BAL/ClusterV/releases/tag/v1.2