Closed huangyixian123 closed 3 years ago
What is your data type, CLR, HIFI or NanoPore ? How many data do you use for assembly? Do you using NGS reads to polish the genome before evaluating the BUSCO score? BTW, Please paste your config file to here.
Thanks. My data type are PacBio (400G) and NanoPore (100G) assembling in NextDenovo about 1 month. Draft assembly is polished using illumina data (1T) with NextPolish before evaluating the BUSCO score. And the config file of Nextdenovo is : [general] job_type = lsf job_prefix = nextDenovo task = all # 'all', 'correct', 'assemble' rewrite = yes # yes/no deltmp = yes rerun = 3 parallel_jobs = 110 input_type = raw input_fofn = input.fofn workdir = 01_rundir cluster_options = -n {cpu}
[correct_option] read_cutoff = 1k seed_cutoff = 13539 blocksize = 2g pa_correction = 20 seed_cutfiles = 20 sort_options = -m 20g -t 10 -k 40 minimap2_options_raw = -x ava-ont -t 8 correction_options = -p 20
[assemble_option] random_round = 20 minimap2_options_cns = -x ava-ont -t 8 -k17 -w17 nextgraph_options = -a 1
And the config file of NextPolish is [General] job_type = local job_prefix = nextPolish task = best rewrite = yes rerun = 3 parallel_jobs = 100 multithread_jobs = 8 genome=01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph00/nextgraph.assembly.contig.fasta genome_size = auto workdir = nextpolish1 polish_options = -p {multithread_jobs}
[sgs_option] sgs_fofn = sgs.fofn sgs_options = -max_depth 1
sgs_options = -max_depth 1
should sgs_options = -max_depth 100
, you can map RNA-seq reads or short genome reads to your assembly and calculate the mapping rate, to check whether the BUSCO score is right?
I try to assemble a diploid genome (7G diploid genome size, heterozygosis rate: 1.14% and repeat content rate: 80.4 based on 17-mers) using NextDenovo, but finally I just get a 3.3 G genome (N50: 749 Kb)and 47.4% complete busco. Could NextDenovo apply to diploid genome? If it doesn't, my 3.3 G genome is just a haploid genome, but why the busco rate is so low?