Buttonwood / Bioinformatics

Useful scripts for NGS analysis
MIT License
0 stars 3 forks source link

Multiple libraries for genome assembly! #2

Open Buttonwood opened 10 years ago

Buttonwood commented 10 years ago

The reason multiple insert libraries are used is to strike a balance between long and short range information. Long-insert mate pair libraries are great at telling you two contigs are linked but doesn't tell you much about the sequence in between. Short-insert libraries can help you determine the exact sequence between two contigs but the information is local.

According to Illumina[1] 's test, mate pair sequencing is supplementation of paired-end data in order to provide sequence depth of regions that are traditionally difficult to cover, especially repeat region. And in general, libraries with larger insert sizes will result in less fragmented assemblies and larger contigs. But the maximal insert size needed will depend on the repeat structure of the organism to be sequenced[2].

In CAGE[3], they test the effect of multiple libraries on assembly. Creating long-range paired-end libraries can be very helpful for assembly, but the sequencing protocols are much more costly and technically more difficult. For five of their assemblers, the best N50 statistic was obtained with the 180-bp and 3-kb library combination.

For the bread wheat genome with high extreme repeat content (>80%), three mate pair (MP) libraries (2 Kb, 3 Kb, and 5 Kb) were sequenced to improve shotgun assemblies of a flowsorted chromosome arms[4].

[1]. http://res.illumina.com/documents/products/technotes/technote_nextera_matepair_data_processing.pdf

[2]. http://res.illumina.com/documents/products/technotes/technote_denovo_assembly_ecoli.pdf

[3].Steven L. Salzberg et al. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 2012 22: 557-567

[4].Tatiana Belova et al. Integration of mate pair sequences to improve shotgun assemblies of flow-sorted chromosome arms of hexaploid wheat. BMC Genomics 2013, 14:222

Buttonwood commented 10 years ago

As you can also refer to BioStar http://www.biostars.org/p/10171/ and http://www.biostars.org/p/4184/