Closed linshengnan2020 closed 3 years ago
Hi, have you tried to run this shell /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns01/nextDenovo.sh
manually? Whether it worked?
I ran this script manually according to your suggestion, and the same error was reported。And I have two sets of data, the same parameter runs, one set of data works well , and the other sets of data reported this error。
There are two solutions:
If the size of seed_cns01/cns.fasta
is similar with cns.fasta
in other directories, just ignore this read, and run the following commands:
cd /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns01/
samtools faidx cns.fasta
awk '{print $1"\t"$3"\t"$2}' cns.fasta.fai > cns.fasta.idx
touch nextDenovo.sh.done
and then rerun the main task.
This error is usually caused by the minimap2-nd task encountering insufficient memory, so you need to rerun the minimap2-nd task related to this task seed_cns01/nextDenovo.sh
.
The value of -i
option in seed_cns01/nextDenovo.sh
is the finally minimap2-nd task result related to this task. So, run cd /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/01.raw_align/03.sort_align.sh.work/sort_align01
, and then you will find a file named input.fofn
, which listes all the minimap2-nd task results input.seed.*.ovl
related to this task. Therefore, you need to regenerate these files, and then re-run sort_align01/nextDenovo.sh
and seed_cns01/nextDenovo.sh
.
To regenerate these files input.seed.*.ovl
, just cd
to its directory and run nextDenovo.sh
.
I check my seed_cns01/cns.fasta , this is empty file . So I tried the second solution, it can works well until today met an error: ctg_graph failed.
Have tried to run ctg_graph manually? BTW, pls paste the full log to here.
the log hostname
the solution cd /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/03.ctg_graph, and then I find a file named 01.ctg_graph.input.ovls, which listes all the results /cns_align/cns.filt.dovt.ovl. I checked every cns.filt.dovt.ovl in cns_align directory,found some cns.filt.dovt.ovl is empty , and then re-run cns_align*/nextDenovo.sh to regenerate cns.filt.dovt.ovl. after regenerated the cns.filt.dovt.ovl , cd /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0 to re-run nextDenovo.sh . after done above , then rerun the main task.
hi, I have anthor question. I have run a 1G , diploid genome with 70~80% repetitive sequences genome . The coverage of my PacBio data is approximately 40 x. As a result the final assembly N50 is 157k . I would be interested how to adjust the parameters to improve the assembly result. Could you please give me some advises? Thank you very much!
Your seed length is too short, so you can try to increase seed_cutoff
to see how about the result. BTW, it is better to sequence more longer reads.
the seed length was generated by the seq_stat script. Is there any standard to increase seed_cutoff ?
No, you should try it, such as set -d 30
. The default option values are not suitable for all species with different sequencing depths.
I will close this issue, if you still have problems unrelated to this topic, please open a new issue.
thank you very much!
the log [INFO]2020-10-14 22:15:51,533 start... [INFO] 2020-10-14 22:15:51,533 logfile: pid68284.log.info [WARNING] 2020-10-14 22:15:51,534 Re-write workdir [INFO] 2020-10-14 22:15:51,534 options: [INFO] 2020-10-14 22:15:51,534 {'sort_threads': 8, 'nodelist': '', 'rewrite': 1, 'blocksize': '2g', 'job_prefix': 'nextDenovo', 'job_type': 'local', 'minimap2_options_map': '-x map-pb', 'cns_threads': 8, 'map_threads': 8, 'sort_mem': '50g', 'seed_cutoff': '11066', 'input_fofn': '/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/input.fofn', 'read_cutoff': '1k', 'input_type': 'raw', 'sort_options': '-m 50g -t 8 -k 40', 'parallel_jobs': '4', 'cluster_options': '', 'sge_queue': '', 'ctg_graphdir': '/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/03.ctg_graph', 'pa_correction': '4', 'workdir': '/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir', 'minimap2_threads': (8, 8), 'minimap2_options_raw': '-x ava-pb -t 8', 'minimap2_options_cns': '-x ava-pb -t 8 -k17 -w17', 'cns_aligndir': '/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align', 'seed_cutfiles': '20', 'raw_aligndir': '/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/01.raw_align', 'task': 'all', 'ctg_cns_options': ' -p 8', 'deltmp': 1, '_random_round_with_less_accuracy': 0, 'rerun': 3, 'correction_options': '-p 8 -max_lq_length 1000', 'nextgraph_options': '-a 1'} [INFO] 2020-10-14 22:15:51,534 skip mkdir: /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir [INFO] 2020-10-14 22:15:51,534 skip mkdir: /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/01.raw_align [INFO] 2020-10-14 22:15:51,534 skip mkdir: /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align [INFO] 2020-10-14 22:15:51,535 skip mkdir: /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/03.ctg_graph [INFO] 2020-10-14 22:15:51,566 analysis tasks done [INFO] 2020-10-14 22:15:51,566 skip step: db_split [INFO] 2020-10-14 22:15:51,570 analysis tasks done [INFO] 2020-10-14 22:15:51,603 skip step: raw_align [INFO] 2020-10-14 22:15:51,646 analysis tasks done [INFO] 2020-10-14 22:15:51,649 skip step: sort_align [INFO] 2020-10-14 22:15:51,653 analysis tasks done [INFO] 2020-10-14 22:15:51,659 total jobs: 1 [INFO] 2020-10-14 22:15:51,660 Throw jobID:[68286] jobCmd:[/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns01/nextDenovo.sh] in the local_cycle. [ERROR] 2020-10-14 22:16:55,620 seed_cns failed: please check the following logs: [ERROR] 2020-10-14 22:16:55,620 /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns01/nextDenovo.sh.e
run.cfg [General] job_type = local job_prefix = nextDenovo task = all # 'all', 'correct', 'assemble' rewrite = yes # yes/no deltmp = yes rerun = 3 parallel_jobs = 4 input_type = raw input_fofn = input.fofn workdir = 01_rundir
[correct_option] read_cutoff = 1k seed_cutoff = 11066 blocksize = 2g pa_correction = 20 seed_cutfiles = 20 sort_options = -m 50g -t 8 -k 40 minimap2_options_raw = -x ava-pb -t 10 correction_options = -p 8
[assemble_option] minimap2_options_cns = -x ava-pb -t 8 -k17 -w17 nextgraph_options = -a 1
/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns01/nextDenovo.sh.e
hostname
could you please give me some advises ? Thank you very much !