Closed Johnsonzcode closed 3 years ago
I think Your assembly is unfinished.Your final assembly file should be at 03.ctg_graph/nd.asm.fasta . Try to rerun ./NextDenovo/nextDenovo run.cfg
If I rerun nextDenovo according to run.cfg, It seems restart the assembly because it backups the old folder and mkdir new one :
[INFO] 2020-12-16 06:34:13,496 start...
[INFO] 2020-12-16 06:34:13,497 logfile: pid335182.log.info
[WARNING] 2020-12-16 06:34:13,498 Change task "all" to "assemble", becasue the input_type is "corrected"
[INFO] 2020-12-16 06:34:13,498 options:
[INFO] 2020-12-16 06:34:13,498 {'job_type': 'local', 'job_prefix': 'nextDenovo_chicken', 'task': 'assemble', 'rewrite': 0, 'deltmp': 1, 'rerun': 3, 'parallel_jobs': '20', 'workdir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken', 'input_type': 'corrected', 'read_cutoff': '1k', 'blocksize': '10g', 'pa_correction': '20', 'nodelist': '', 'cluster_options': '', 'sge_queue': '', 'seed_cutfiles': '20', 'correction_options': '-p 20 -max_lq_length 10000 -min_len_seed 17142', 'sort_options': '-m 200g -t 10 -k 40', '_random_round_with_less_accuracy': 0, 'minimap2_options_cns': '-x ava-ont -t 8 -k17 -w17 --minlen 2000 --maxhan1 5000', 'minimap2_options_map': ' -x map-ont', 'ctg_cns_options': ' -p 20', 'input_fofn': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/input.fofn', 'seed_cutoff': '34285', 'minimap2_options_raw': '-x ava-ont -t 8', 'nextgraph_options': '-a 1', 'raw_aligndir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/01.raw_align', 'cns_aligndir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/02.cns_align', 'ctg_graphdir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph', 'sort_threads': 10, 'sort_mem': '200g', 'minimap2_threads': (8, 8), 'cns_threads': 20, 'map_threads': 20}
[INFO] 2020-12-16 06:34:13,498 skip mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken
[WARNING] 2020-12-16 06:34:13,498 backup /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/01.raw_align to /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/01.raw_align.backup.v0
[INFO] 2020-12-16 06:34:13,498 mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/01.raw_align
[WARNING] 2020-12-16 06:34:13,499 backup /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/02.cns_align to /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/02.cns_align.backup.v0
[INFO] 2020-12-16 06:34:13,499 mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/02.cns_align
[WARNING] 2020-12-16 06:34:13,499 backup /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph to /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph.backup.v0
[INFO] 2020-12-16 06:34:13,499 mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph
[INFO] 2020-12-16 06:34:13,500 analysis tasks done
[INFO] 2020-12-16 06:34:18,508 total jobs: 1
[INFO] 2020-12-16 06:34:18,510 Throw jobID:[335288] jobCmd:[/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/02.cns_align/01.split_seed.sh.work/split_seed0/nextDenovo_chicken.sh] in the local_cycle.
......
Try to set rewrite = yes
Try to set
rewrite = yes
If set rewrite = yes
, there won't be Segmentation fault
, right ?
I try as you say. Thank you for your reply but it appears same Segmentation fault
[INFO] 2020-12-16 17:25:43,332 start...
[INFO] 2020-12-16 17:25:43,333 logfile: pid377773.log.info
[WARNING] 2020-12-16 17:25:43,334 Re-write workdir
[WARNING] 2020-12-16 17:25:43,334 Change task "all" to "assemble", becasue the input_type is "corrected"
[INFO] 2020-12-16 17:25:43,334 options:
[INFO] 2020-12-16 17:25:43,334 {'job_type': 'local', 'job_prefix': 'nextDenovo_chicken', 'task': 'assemble', 'rewrite': 1, 'deltmp': 1, 'rerun': 3, 'parallel_jobs': '20', 'workdir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken', 'input_type': 'corrected', 'read_cutoff': '1k', 'blocksize': '10g', 'pa_correction': '20', 'nodelist': '', 'cluster_options': '', 'sge_queue': '', 'seed_cutfiles': '20', 'correction_options': '-p 20 -max_lq_length 10000 -min_len_seed 17142', 'sort_options': '-m 200g -t 10 -k 40', '_random_round_with_less_accuracy': 0, 'minimap2_options_cns': '-x ava-ont -t 8 -k17 -w17 --minlen 2000 --maxhan1 5000', 'minimap2_options_map': ' -x map-ont', 'ctg_cns_options': ' -p 20', 'input_fofn': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/input.fofn', 'seed_cutoff': '34285', 'minimap2_options_raw': '-x ava-ont -t 8', 'nextgraph_options': '-a 1', 'raw_aligndir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/01.raw_align', 'cns_aligndir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/02.cns_align', 'ctg_graphdir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph', 'sort_threads': 10, 'sort_mem': '200g', 'minimap2_threads': (8, 8), 'cns_threads': 20, 'map_threads': 20}
[INFO] 2020-12-16 17:25:43,334 skip mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken
[INFO] 2020-12-16 17:25:43,335 skip mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/01.raw_align
[INFO] 2020-12-16 17:25:43,335 skip mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/02.cns_align
[INFO] 2020-12-16 17:25:43,335 skip mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph
[INFO] 2020-12-16 17:25:43,336 analysis tasks done
[INFO] 2020-12-16 17:25:43,336 skip step: split_seed
[INFO] 2020-12-16 17:25:43,338 analysis tasks done
[INFO] 2020-12-16 17:25:43,339 skip step: cns_align
[INFO] 2020-12-16 17:25:43,339 analysis tasks done
[INFO] 2020-12-16 17:25:43,339 skip step: ctg_graph
nextdenovo.sh: line 6: 377773 Segmentation fault (core dumped) ./NextDenovo/nextDenovo run.cfg
You can check whether the output of /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo_chicken.sh
is correct?
nextDenovo_chicken.sh looks like:
#!/bin/bash
set -xve
hostname
cd /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
time /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/NextDenovo/bin/nextgraph -a 1 -f /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.input.seqs /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta;
touch /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo_chicken.sh.done
where nextDenovo_chicken.sh.done
exits. It means that nextDenovo_chicken.sh
finished. But I don't know how to check nd.asm.p.fasta
is correct :
# ls -lh
1014M Dec 16 14:34 nd.asm.p.fasta
If it is final assembly, the size of nd.asm.p.fasta
seems correct. I set 1g in run.cfg
.
Could you paste the content of /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo_chicken.sh.e
to here?
Of course :
hostname
+ hostname
cd /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
+ cd /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
time /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/NextDenovo/bin/nextgraph -a 1 -f /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.input.seqs /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta;
+ /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/NextDenovo/bin/nextgraph -a 1 -f /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.input.seqs /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta
[INFO] 2020-12-16 14:33:50 Initialize graph and reading...
[INFO] 2020-12-16 14:34:13 Initial Node(s): 261692, Edge(s): 2220758
[INFO] 2020-12-16 14:34:14 Depth stat, Mid: 57.000 Max: 114000.000 Repeat: 85.500 L:N:H: 0.182:0.800:0.018
[INFO] 2020-12-16 14:34:14 Outdegree stat, Mid: 9.000 Max: 18000.000 Repeat: 13.500 L:N:H: 0.181:0.819:0.000
[INFO] 2020-12-16 14:34:15 Chimeric node ratio: 0.199% (candidate: 0.639%)
[INFO] 2020-12-16 14:34:21 Assembly done and outputting...
[INFO] 2020-12-16 14:34:26 Assembly stat:
Type Length (bp) Count (#)
N10 90931881 2
N20 81289399 3
N30 76172526 4
N40 47812504 6
N50 27316320 9
N60 19614968 13
N70 14095085 19
N80 9306926 28
N90 5843368 42
Min. 58226 -
Max. 98600593 -
Ave. 4887846 -
Total 1060662579 217
[INFO] 2020-12-16 14:34:26 CMD:
/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/NextDenovo/bin/nextgraph -a 1 -f /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.input.seqs -o nd.asm.p.fasta /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.input.ovls
[INFO] 2020-12-16 14:34:26 Real time: 35.344 sec; CPU: 35.060 sec; Peak RSS: 0.598 GB
real 0m35.372s
user 0m32.308s
sys 0m2.753s
touch /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo_chicken.sh.done
+ touch /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo_chicken.sh.done
It seems finished correctly but I am not sure. So in the top of this topic I wondering whether my assembly is finished.
It seems everything is OK, I can not solve it because I can't reproduce this error except I can access your system to debug.
Optional, you can use nd.asm.p.fasta
as the final result, but it may contain more errors than the final result.
Can I manually finish steps between nd.asm.p.fasta
and final result ?
In the beginning, although I use CCS reads to correct ONT reads and set input_type = corrected
, I am not sure its quality. So if there is a way to avoid errors, I want to refine it manually.
No, there must be an error when running, you cannot skip it. You can polish this genome using Hifi reads or short reads to improve the single-base accuracy, and you can use Bionano or Hic data to improve the structural accuracy.
OK
(asm_practise) [poultrylab1@pbsnode01 3.nextdenovo]$ python NextDenovo/nextDenovo run.cfg
[INFO] 2020-12-16 19:17:20,477 start...
[INFO] 2020-12-16 19:17:20,477 logfile: pid30071.log.info
[WARNING] 2020-12-16 19:17:20,479 Re-write workdir
[INFO] 2020-12-16 19:17:20,479 options:
[INFO] 2020-12-16 19:17:20,479 {'job_type': 'local', 'job_prefix': 'nextDenovo_chicken', 'task': 'assemble', 'rewrite': 1, 'deltmp': 1, 'rerun': 3, 'parallel_jobs': '20', 'workdir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken', 'input_type': 'corrected', 'read_cutoff': '1k', 'blocksize': '10g', 'pa_correction': '20', 'nodelist': '', 'cluster_options': '', 'sge_queue': '', 'seed_cutfiles': '20', 'correction_options': '-p 20 -max_lq_length 10000 -min_len_seed 17142', 'sort_options': '-m 200g -t 10 -k 40', '_random_round_with_less_accuracy': 0, 'minimap2_options_cns': '-x ava-ont -t 8 -k17 -w17 --minlen 2000 --maxhan1 5000', 'minimap2_options_map': ' -x map-ont', 'ctg_cns_options': ' -p 20', 'input_fofn': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/input.fofn', 'seed_cutoff': '34285', 'minimap2_options_raw': '-x ava-ont -t 1', 'nextgraph_options': '-a 1', 'raw_aligndir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/01.raw_align', 'cns_aligndir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/02.cns_align', 'ctg_graphdir': '/storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph', 'sort_threads': 10, 'sort_mem': '200g', 'minimap2_threads': (1, 8), 'cns_threads': 20, 'map_threads': 20}
[INFO] 2020-12-16 19:17:20,479 skip mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken
[INFO] 2020-12-16 19:17:20,479 skip mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/01.raw_align
[INFO] 2020-12-16 19:17:20,479 skip mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/02.cns_align
[INFO] 2020-12-16 19:17:20,480 skip mkdir: /storage-01/poultrylab1/zhaoqiangsen/GenomeAssembly/chicken/3.nextdenovo/chicken/03.ctg_graph
[INFO] 2020-12-16 19:17:20,481 analysis tasks done
[INFO] 2020-12-16 19:17:20,481 skip step: split_seed
[INFO] 2020-12-16 19:17:20,483 analysis tasks done
[INFO] 2020-12-16 19:17:20,484 skip step: cns_align
[INFO] 2020-12-16 19:17:20,484 analysis tasks done
[INFO] 2020-12-16 19:17:20,484 skip step: ctg_graph
Segmentation fault (core dumped)
The same as before
We fix it by modifying run.cfg like this :
run.cfg :
[General]
job_type = local
job_prefix = nextDenovo_chicken
task = all # 'all', 'correct', 'assemble'
rewrite = no # yes/no #
deltmp = yes
rerun = 3
parallel_jobs = 20
input_type = raw #
input_fofn = input.fofn
workdir = chicken
[correct_option]
read_cutoff = 1k
seed_cutoff = 34285 #
blocksize = 14g #
pa_correction = 20
seed_cutfiles = 20
sort_options = -m 20g -t 8 -k 40
minimap2_options_raw = -x ava-ont -t 8
correction_options = -p 8 #
minimap2_options_cns = -x ava-ont -t 8 -k17 -w17
nextgraph_options = -a 1
We think it may be a problem of resource allocation, [correct_option]
and [assemble_option]
just as part of nextDnovo, they should set smaller than we set before.
Question or Expected behavior After "ctg_graph done", there is "Segmentation fault" :
Operating system
GCC gcc version 9.2.0 (GCC)
Python Python 3.6.11
NextDenovo 2.3.1
Additional context (Optional) run.cfg : [General] job_type = local job_prefix = nextDenovo_chicken task = all # 'all', 'correct', 'assemble' rewrite = no # yes/no # deltmp = yes rerun = 3 parallel_jobs = 20 input_type = corrected # input_fofn = input.fofn workdir = chicken
[correct_option] read_cutoff = 1k seed_cutoff = 34285 # blocksize = 10g # pa_correction = 20 seed_cutfiles = 20 sort_options = -m 200g -t 10 -k 40 minimap2_options_raw = -x ava-ont -t 8 correction_options = -p 20 #
[assemble_option] minimap2_options_cns = -x ava-ont -t 8 -k17 -w17 nextgraph_options = -a 1