Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
350 stars 52 forks source link

segmentation fault building ctg_graph using NEXTDENOVO/2.4.0 #133

Open gitcruz opened 2 years ago

gitcruz commented 2 years ago

Describe the bug I am running an assembly of 1.7G heterozygous genome (1.2% het rate) using a 2TB machine. The ONT data is 50x of the highest quality (used Filtlong ≥5Kb and 150Gb)

1st config file (24cpus 1TB total RAM): [General] job_type = local task = all rewrite = yes parallel_jobs = 4 deltmp = yes read_type = ont input_type = raw workdir = /WORKDIR/ input_fofn = /WORKDIR/long_reads.fofn [correct_option] read_cutoff = 1k genome_size = 1.8g seed_depth = 45 seed_cutoff = 0 blocksize = 1g pa_correction = 4 minimap2_options_raw = -t 6 -x ava-ont sort_options = -m 40g -t 20 correction_options = -p 6

[assemble_option] minimap2_options_cns = -t 6 -x ava-ont -k17 -w17 minimap2_options_map = -t 6 -x ava-ont nextgraph_options = -a 1

2nd config file (48cpus 2TB total RAM): [General] job_type = local task = all rewrite = yes parallel_jobs = 8 deltmp = yes read_type = ont input_type = raw workdir = /WORKDIR/ input_fofn = /WORKDIR/long_reads.fofn

[correct_option] read_cutoff = 1k genome_size = 1.8g seed_depth = 45 seed_cutoff = 0 blocksize = 1g pa_correction = 4 minimap2_options_raw = -t 6 -x ava-ont sort_options = -m 40g -t 20 correction_options = -p 6

[assemble_option] minimap2_options_cns = -t 6 -x ava-ont -k17 -w17 minimap2_options_map = -t 6 -x ava-ont nextgraph_options = -a 1

Error message After 10 days the assembly failed I/O error at the 02.cns_align step (see fosrt config). I removed this folder and resubmitted the assembly with more memory (2nd config). It went smoothly but now constantly failing at the ctg_graph step. the error is this: hostname

Genome characteristics C-value =1.7Gb Paste here the genomescope results: GenomeScope version 2.0 input file = jf_21mer.hist output directory = out/21mer/ p = 2 k = 21

property min max Homozygous (aa) 98.7068% 98.7307% Heterozygous (ab) 1.26928% 1.29316% Genome Haploid Length 1,208,134,973 bp 1,210,345,670 bp Genome Repeat Length 399,334,371 bp 400,065,090 bp Genome Unique Length 808,800,602 bp 810,280,580 bp Model Fit 73.122% 95.132% Read Error Rate 0.214032% 0.214032%

Input data [Read length stat] Types Count (#) Length (bp) N10 266461 29793 N20 648378 23529 N30 1113845 19774 N40 1660889 16968 N50 2295837 14643 N60 3032994 12575 N70 3896295 10664 N80 4925021 8844 N90 6190301 7021

Types Count (#) Bases (bp) Depth (X) Raw 7860332 100000021650 55.56 Filtered 0 0 0.00 Clean 7860332 100000021650 55.56

Config file Last config used was: [General] job_type = local task = all rewrite = yes parallel_jobs = 8 deltmp = yes read_type = ont input_type = raw workdir = /WORKDIR/ input_fofn = /WORKDIR/long_reads.fofn

[correct_option] read_cutoff = 1k genome_size = 1.8g seed_depth = 45 seed_cutoff = 0 blocksize = 1g pa_correction = 4 minimap2_options_raw = -t 6 -x ava-ont sort_options = -m 40g -t 40 correction_options = -p 6

[assemble_option] minimap2_options_cns = -t 6 -x ava-ont -k17 -w17 minimap2_options_map = -t 6 -x ava-ont nextgraph_options = -a 1

Operating system

LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core- 4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 6.7 (Santiago) Release: 6.7 Codename: Santiago

GCC gcc version 6.3.0 (GCC)

Python Python 3.8.6

NextDenovo nextDenovo v2.4.0

To Reproduce (Optional) Steps to reproduce the behavior. Providing a minimal test dataset on which we can reproduce the behavior will generally lead to quicker turnaround time!

Additional context (Optional)

I made three attempts and error is always: line 5: 19296 Segmentation fault /apps/NEXTDENOVO/2.4.0/bin/nextgraph any idea on what the problem could be? I'll be happy to check some intermediate files.

The files in 01.ctg_graph.input.ovls are not empty their sizes range 43M to 195M in the folder 02.cns_alig/*.cns.filt.dovt.ovl

Input_seqs also are there:

for i in $(cat 03.ctg_graph/01.ctg_graph.input.seqs); do ls -sh $i; done 4.3G 02.cns_align/01.seed_cns.sh.work/seed_cns0/cns.fasta 4.4G 02.cns_align/01.seed_cns.sh.work/seed_cns1/cns.fasta 4.4G 02.cns_align/01.seed_cns.sh.work/seed_cns2/cns.fasta 2.7G 02.cns_align/01.seed_cns.sh.work/seed_cns3/cns.fasta 4.4G 02.cns_align/01.seed_cns.sh.work/seed_cns4/cns.fasta

Any ideas or suggestions on how to fix this problem are welcome!

Thanks

moold commented 2 years ago

Hi, see #113 to fix this error.