Closed HippoYI closed 1 year ago
Could you share the failed subtask log here?
I posted the running log and the **.e file in "ctg_graph1" directory which point to the last and the failed subtask. I am not sure that's what you need. If not, please let me know. nextDenovo.sh.e.txt pid6864.log.txt
See the instructions below:
Error message
Paste the complete log message, include the main task log and failed subtask log.
The main task log is usually located in your working directory and is named pidXXX.log.info
and the main task log will tell you the failed subtask log in the last few lines, such as:
[ERROR] 2020-07-01 11:06:57,184 cns_align failed: please check the following logs: [ERROR] 2020-07-01 11:06:57,185 ~/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align0/nextDenovo.sh.e
As I didn't save the running situation at the screen last time, I rerun the program in the last 2 days. As you can see in the "snapshot.jpg", the subtask did not give any error message, just "Segmentation fault (core dumped)" after ctg_graph was done.
Hi, Acutally, you don't have to rerun the whole process, just see here to continue running unfinished tasks.
For the segmentation falut, I guess this is caused by the calgs
function in the file lib/kit.py
, so you can replace this function with the following python code:
def calgs(infile):
from Bio import SeqIO
gs = 0
for seq_record in SeqIO.parse(infile, "fasta"):
gs += len(seq_record.seq)
return gs
Hi, I replaced the calgs function in kit.py, and got these info:
[56473 INFO] 2022-09-07 15:27:58 skip step: db_split [56473 INFO] 2022-09-07 15:27:58 skip step: raw_align [56473 INFO] 2022-09-07 15:27:58 skip step: sort_align [56473 INFO] 2022-09-07 15:27:58 skip step: seed_cns [56473 INFO] 2022-09-07 15:27:58 seed_cns finished, and final corrected reads file: [56473 INFO] 2022-09-07 15:27:58 /data/yixin/projects/JH_genome_analysis/New_genome_assembly_related/NextD-assembly/./01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns*/cns.fasta [56473 INFO] 2022-09-07 15:27:58 skip step: cns_align [56473 INFO] 2022-09-07 15:27:58 skip step: ctg_graph Segmentation fault (core dumped)
oo, so, Next, try to change this line total_seed_len = cal_total_seed_len(get_seed_files(idx=True))
in file nextDenovo
to total_seed_len =1000
and this line minlen = cal_minlen_from_idx(part_idx_files, len(part_idx_files), gs * mindepth - total_seed_len)
in file nextDenovo
to minlen = 2000
wow, great! ... It worked after changing those two lines, and now I can finally get the "nd.asm.fasta". I am just curious about the changes, will it affect the final contigs corrections when the total seed length was fixed to 1000?
For your data, it should not.
Thanks so much. I really appreciate your help in resolving this !
Describe the bug I am running an assembly of about 300M genome(0.6% het rate) using a 512GB machine. The Ultralong reads is about 27X.
Error message The program run well and get nd.asm.p.fasta after runing ctg_graph, but then the program stopped and reported segmentation fault (core dumped). This meant that the program failed to run "02.ctg_align" and "03.ctg_cns". I have tried many parameters in run.cfg and even change to a machine wit 2TB memory, but the error still occurred at the same point.
Input data
Total base count=8358015912bp, sequencing depth=27X, average/N50 read length=100709
Config file [General] job_type = local job_prefix = nextDenovo task = all rewrite = yes deltmp = yes parallel_jobs = 2 input_type = raw read_type = ont input_fofn = input.fofn workdir = 01_rundir
[correct_option] read_cutoff = 1k genome_size = 300m sort_options = -m 40g -t 5 minimap2_options_raw = -t 5 pa_correction = 5 correction_options = -p 4
[assemble_option] minimap2_options_cns = -t 5 nextgraph_options = -a 1 -q 10
Operating system CentOS Linux release 7.9.2009
GCC
Python Python 2.7.5 and Python 3.6.2
NextDenovo 2.5.0
As the FAQ mentioned that nd.asm.p.fasta contains more structural & base errors than nd.asm.fasta, so I really want to solve this. Any ideas or suggestions on how to fix this problem?
Thank you!