Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
350 stars 52 forks source link

Low BUSCO completeness value #94

Open fengyuanli304 opened 3 years ago

fengyuanli304 commented 3 years ago

Question or Expected behavior Hi, I have recently assembled a genome while BUSCO gives me a 79.2% completeness. I am confused about the result. Parameters: [General] job_type = local job_prefix = nextDenovo2.3.0 task = all # 'all', 'correct', 'assemble' rewrite = yes # yes/no deltmp = yes rerun = 3 parallel_jobs = 8 input_type = raw input_fofn = ./input.fofn workdir = ./

[correct_option] read_cutoff = 1k seed_cutoff = 36k blocksize = 3g pa_correction = 4 seed_cutfiles = 4 sort_options = -m 80g -t 25 -k 45 minimap2_options_raw = -x ava-pb -t 12 correction_options = -p 20

[assemble_option] minimap2_options_cns = -x ava-pb -t 12 -k17 -w17 nextgraph_options = -a 1

Results: Type Length (bp) Count (#) N10 9524294 28 N20 7278582 71 N30 5400301 128 N40 4361970 200 N50 3434731 290 N60 2824318 403 N70 2238539 540 N80 1649146 722 N90 962337 995

Min. 32875 - Max. 18687322 - Ave. 1966545 - Total 3502417316 1781 Contig N50 is high, but Busco completeness value is low. Busco INFO:

|Results from dataset arachnida_odb10

|C:78.2%[S:53.9%,D:24.3%],F:0.8%,M:21.0%,n:2934 |2293 Complete BUSCOs (C) |1581 Complete and single-copy BUSCOs (S) |712 Complete and duplicated BUSCOs (D) |24 Fragmented BUSCOs (F) |617 Missing BUSCOs (M) |2934 Total BUSCO groups searched

Next, I polished the assembly with nextPolish. Parameters: [General] job_type = local job_prefix = nextPolish task = 555121212 rewrite = yes rerun = 3 parallel_jobs = 6 multithread_jobs = 5 genome = ./nd.asm.fasta genome_size = auto workdir = ./01_rundir polish_options = -p {multithread_jobs}

[sgs_option] sgs_fofn = ./sgs.fofn sgs_options = -max_depth 80 -bwa

[lgs_option] lgs_fofn = ./lgs.fofn lgs_options = -min_read_len 10k -max_read_len 150k -max_depth 90 lgs_minimap2_options = -x map-pb

Results: Type Length (bp) Count (#) N10 9511552 28 N20 7272313 71 N30 5396198 128 N40 4360921 200 N50 3429793 290 N60 2821796 403 N70 2237830 540 N80 1648586 722 N90 960793 995

Min. 32664 - Max. 18658276 - Ave. 1963950 - Total 3497795421 1781

Busco INFO:

|Results from dataset arachnida_odb10

|C:79.2%[S:45.9%,D:33.3%],F:0.3%,M:20.5%,n:2934
|2324 Complete BUSCOs (C)
|1348 Complete and single-copy BUSCOs (S)
|976 Complete and duplicated BUSCOs (D)
|10 Fragmented BUSCOs (F)
|600 Missing BUSCOs (M)
|2934 Total BUSCO groups searched

Operating system Which operating system and version are you using? CentOS Linux release 7.6.1810

GCC What version of GCC are you using? 4.8.5 20150623 (Red Hat 4.8.5-36)

Python What version of Python are you using? python2.7.18

NextDenovo What version of NextDenovo are you using? nextdenovo2.3.0

Additional context (Optional)

Can you help me to figure out how to solve it? Thank you.

moold commented 3 years ago

I think, first, you need to map short genome reads or transcript reads to the assembly to check how many reads can be mapped in the assembly? May be the low BUSCO value is caused by species specificity. Also you can try other assembly tools to see how about the BUSCO values.

tinyfallen commented 3 years ago

you may try nextpolish with NGS clean data , which improve my assembly's busco ratio to 97% from 62% using the eudicots busco db