Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
360 stars 53 forks source link

seed_cns failed #91

Closed linshengnan2020 closed 3 years ago

linshengnan2020 commented 3 years ago

the log [INFO]2020-10-14 22:15:51,533 start... [INFO] 2020-10-14 22:15:51,533 logfile: pid68284.log.info [WARNING] 2020-10-14 22:15:51,534 Re-write workdir [INFO] 2020-10-14 22:15:51,534 options: [INFO] 2020-10-14 22:15:51,534 {'sort_threads': 8, 'nodelist': '', 'rewrite': 1, 'blocksize': '2g', 'job_prefix': 'nextDenovo', 'job_type': 'local', 'minimap2_options_map': '-x map-pb', 'cns_threads': 8, 'map_threads': 8, 'sort_mem': '50g', 'seed_cutoff': '11066', 'input_fofn': '/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/input.fofn', 'read_cutoff': '1k', 'input_type': 'raw', 'sort_options': '-m 50g -t 8 -k 40', 'parallel_jobs': '4', 'cluster_options': '', 'sge_queue': '', 'ctg_graphdir': '/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/03.ctg_graph', 'pa_correction': '4', 'workdir': '/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir', 'minimap2_threads': (8, 8), 'minimap2_options_raw': '-x ava-pb -t 8', 'minimap2_options_cns': '-x ava-pb -t 8 -k17 -w17', 'cns_aligndir': '/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align', 'seed_cutfiles': '20', 'raw_aligndir': '/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/01.raw_align', 'task': 'all', 'ctg_cns_options': ' -p 8', 'deltmp': 1, '_random_round_with_less_accuracy': 0, 'rerun': 3, 'correction_options': '-p 8 -max_lq_length 1000', 'nextgraph_options': '-a 1'} [INFO] 2020-10-14 22:15:51,534 skip mkdir: /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir [INFO] 2020-10-14 22:15:51,534 skip mkdir: /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/01.raw_align [INFO] 2020-10-14 22:15:51,534 skip mkdir: /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align [INFO] 2020-10-14 22:15:51,535 skip mkdir: /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/03.ctg_graph [INFO] 2020-10-14 22:15:51,566 analysis tasks done [INFO] 2020-10-14 22:15:51,566 skip step: db_split [INFO] 2020-10-14 22:15:51,570 analysis tasks done [INFO] 2020-10-14 22:15:51,603 skip step: raw_align [INFO] 2020-10-14 22:15:51,646 analysis tasks done [INFO] 2020-10-14 22:15:51,649 skip step: sort_align [INFO] 2020-10-14 22:15:51,653 analysis tasks done [INFO] 2020-10-14 22:15:51,659 total jobs: 1 [INFO] 2020-10-14 22:15:51,660 Throw jobID:[68286] jobCmd:[/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns01/nextDenovo.sh] in the local_cycle. [ERROR] 2020-10-14 22:16:55,620 seed_cns failed: please check the following logs: [ERROR] 2020-10-14 22:16:55,620 /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns01/nextDenovo.sh.e

run.cfg [General] job_type = local job_prefix = nextDenovo task = all # 'all', 'correct', 'assemble' rewrite = yes # yes/no deltmp = yes rerun = 3 parallel_jobs = 4 input_type = raw input_fofn = input.fofn workdir = 01_rundir

[correct_option] read_cutoff = 1k seed_cutoff = 11066 blocksize = 2g pa_correction = 20 seed_cutfiles = 20 sort_options = -m 50g -t 8 -k 40 minimap2_options_raw = -x ava-pb -t 10 correction_options = -p 8

[assemble_option] minimap2_options_cns = -x ava-pb -t 8 -k17 -w17 nextgraph_options = -a 1

/home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns01/nextDenovo.sh.e

hostname

could you please give me some advises ? Thank you very much !

moold commented 3 years ago

Hi, have you tried to run this shell /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns01/nextDenovo.sh manually? Whether it worked?

linshengnan2020 commented 3 years ago

I ran this script manually according to your suggestion, and the same error was reported。And I have two sets of data, the same parameter runs, one set of data works well , and the other sets of data reported this error。

moold commented 3 years ago

There are two solutions:

  1. If the size of seed_cns01/cns.fasta is similar with cns.fasta in other directories, just ignore this read, and run the following commands:

    cd /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns01/
    samtools faidx cns.fasta
    awk '{print $1"\t"$3"\t"$2}' cns.fasta.fai > cns.fasta.idx
    touch nextDenovo.sh.done

    and then rerun the main task.

  2. This error is usually caused by the minimap2-nd task encountering insufficient memory, so you need to rerun the minimap2-nd task related to this task seed_cns01/nextDenovo.sh.

    The value of -i option in seed_cns01/nextDenovo.sh is the finally minimap2-nd task result related to this task. So, run cd /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/01.raw_align/03.sort_align.sh.work/sort_align01, and then you will find a file named input.fofn, which listes all the minimap2-nd task results input.seed.*.ovl related to this task. Therefore, you need to regenerate these files, and then re-run sort_align01/nextDenovo.sh and seed_cns01/nextDenovo.sh.

    To regenerate these files input.seed.*.ovl, just cd to its directory and run nextDenovo.sh.

linshengnan2020 commented 3 years ago

I check my seed_cns01/cns.fasta , this is empty file . So I tried the second solution, it can works well until today met an error: ctg_graph failed.

moold commented 3 years ago

Have tried to run ctg_graph manually? BTW, pls paste the full log to here.

linshengnan2020 commented 3 years ago

the log hostname

the solution cd /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/03.ctg_graph, and then I find a file named 01.ctg_graph.input.ovls, which listes all the results /cns_align/cns.filt.dovt.ovl. I checked every cns.filt.dovt.ovl in cns_align directory,found some cns.filt.dovt.ovl is empty , and then re-run cns_align*/nextDenovo.sh to regenerate cns.filt.dovt.ovl. after regenerated the cns.filt.dovt.ovl , cd /home/linshengnan/03_work/00_dianthus_work/01_qumai345_genome/03_genome_asm/01_next/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0 to re-run nextDenovo.sh . after done above , then rerun the main task.

linshengnan2020 commented 3 years ago

hi, I have anthor question. I have run a 1G , diploid genome with 70~80% repetitive sequences genome . The coverage of my PacBio data is approximately 40 x. As a result the final assembly N50 is 157k . I would be interested how to adjust the parameters to improve the assembly result. Could you please give me some advises? Thank you very much!

moold commented 3 years ago

Your seed length is too short, so you can try to increase seed_cutoff to see how about the result. BTW, it is better to sequence more longer reads.

linshengnan2020 commented 3 years ago

the seed length was generated by the seq_stat script. Is there any standard to increase seed_cutoff ?

moold commented 3 years ago

No, you should try it, such as set -d 30. The default option values are not suitable for all species with different sequencing depths. I will close this issue, if you still have problems unrelated to this topic, please open a new issue.

linshengnan2020 commented 3 years ago

thank you very much!