broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
338 stars 60 forks source link

About the "diploid" option #107

Open hungweichen0327 opened 4 years ago

hungweichen0327 commented 4 years ago

Dear community,

I would like to ask about the "diploid" option more. I used 59.5X coverage Illumina reads for polishing 500 Mb plant genome. After polishing, the total length of the assembly increased 9 Mb. Since this diploid plant is selfing-pollinating species, the heterozygosity is low. I want to know whether I should add "diploid" option when I ran with pilon?

The code I used is below:

pilon --genome ..Crystal_flye_10kb.fasta --bam ./Crystal_flye_10kb.bam --output Crystal_flye_10kb_1t_polished --outdir ./ --threads 40 --diploid --iupac

The summary of the original assembly:

Assembly                    assembly 
# contigs (>= 0 bp)         193      
# contigs (>= 1000 bp)      190      
# contigs (>= 5000 bp)      170      
# contigs (>= 10000 bp)     135      
# contigs (>= 25000 bp)     92       
# contigs (>= 50000 bp)     79       
Total length (>= 0 bp)      497763277
Total length (>= 1000 bp)   497760798
Total length (>= 5000 bp)   497699461
Total length (>= 10000 bp)  497430037
Total length (>= 25000 bp)  496719271
Total length (>= 50000 bp)  496271379
# contigs                   193      
Largest contig              39122544 
Total length                497763277
GC (%)                      33.48    
N50                         18281884 
N75                         15763204 

The summary of assembly after polished by Pilon one time:

Assembly                    Crystal_flye_10kb_1t_polished
# contigs (>= 0 bp)         193                          
# contigs (>= 1000 bp)      190                          
# contigs (>= 5000 bp)      168                          
# contigs (>= 10000 bp)     135                          
# contigs (>= 25000 bp)     92                           
# contigs (>= 50000 bp)     79                           
Total length (>= 0 bp)      506700577                    
Total length (>= 1000 bp)   506698114                    
Total length (>= 5000 bp)   506627151                    
Total length (>= 10000 bp)  506370294                    
Total length (>= 25000 bp)  505664160                    
Total length (>= 50000 bp)  505219085                    
# contigs                   193                          
Largest contig              39603641                     
Total length                506700577                    
GC (%)                      33.37                        
N50                         18962220                     
N75                         16096714                         

Thank you!