GAA - Graph Accordance of Assemly
Version 1.0
Copyright 2011 - Guohui Yao - Washington University in St. Louis.
gyao at wustl dot edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
No individual assembly algorithm addresses all the known limitations of assembling short length sequences. Overall reduced sequence contig length is the major problem that challenges the usage of these assemblies. GAA is designed to take advantages of different assembly algorithms or sequencing platforms to improve the quality of next-generation sequence (NGS) assemblies.
Perl => 5.8.9 Sequence aligners: BLAT GSMapper
The program is tested under LINUX systems. Below command could install the GAA package. export PERL5LIB=$PERL5LIB:path_to_the_gaa_package
Run gaa.pl without any parameter, it will print below usage on the screen.
-.pl --target|-t
The following options are optional.
--target-qual|-T
--pair_status_target|r <454PairStatus.txt> : generated from GSMapper
--pair_status_query|R <454PairStatus.txt> : generated from GSMapper
--ce <ce.psl> : ce status psl file
--output-dir|-o <output_dir=pwd> : default is merged_assembly under current directory
--tip-size|-p <tip_size=90> : tip tolerance
--score-cutoff-lower|-l <score_cutoff_lower=30> : confident unique contig, if score < 30
--score-cutoff-upper|-u <score-cutoff_upper=100> : confident match, if score > 100
--contig-size-cutoff|-s <contig_size_cutoff=100> : keep unique contigs of length > 100
Using -m option would shortcut the alignment step if alignment.psl alread exists.
$ ./gaa.pl -t newbler.fa -q cabog.fa Merge newbler.fa and cabog.fa, and output newbler.fa_n_cabog.fa as the merged assembly to:
$ ./gaa.pl -t newbler.fa -q cabog.fa -T newbler.qual -Q cabog.qual Also generate a quality file (newbler.qual_n_cabog.qual) for the merged assembly.
$ ./gaa.pl -t newbler.fa -q cabog.fa -t newbler.gap Generate the gap file (g.gap) for merged assembly.
$ head target.fa
Contig0.1 AAGgAAGCAAATtAtAAAaCtAaaGCATTAAAGATATACACATAAAAATGAAGTACTAAA TTTATCAAGACAAAAaGCCCTGTCCAAATTGTACTATTGTCCAACTTGTACCACTTTACC Contig0.2 CCCTAAATTGAGAAAAaTAACTTTAGGGAATATATCAATGTACAGTGCCCGCTTCGTTAT TCGGGAGAGAAATTCGAAAAAGTTAAAATAACATTCCGGAAATGTTAATTTTACCCTGCA Contig0.3 GTATTGATCCAAAATCGGTGTAAATATTACCCTTTTTAGGTGTATTAGGTGTTAAATGTA TCCTTTTTCATGTTAATTTTACCCTTAATAAGGTGTAAAATTAACATTAAAAAATGTTGA TATATTTTTACACCTAAAAAGTGTTAAATTTCTGAGGAAAAATGTTAATCGCACCCTCTT
$ head target.qual
Contig0.1 64 64 64 26 64 64 64 64 64 64 50 64 26 64 16 64 64 64 22 64 39 64 39 39 64 64 55 64 64 64 64 64 64 64 64 64 64 64 64 48 64 64 64 64 64 64 64 49 45 64 64 64 64 64 64 64 64 64 64 32 64 64 64 64 64 64
$ head gap.txt Contig0.1 20 Contig0.2 811 Contig0.3 932 Contig0.4 876 Contig0.5 1927 Contig0.6 1730 Contig0.7 20 Contig1.1 171 Contig1.2 402 Contig1.3 1382
$ head 454PairStatus.txt
Template Status Distance Left Accno Left Pos Left Dir Right Accno Right Pos Right Dir Left Distance Right Distance
FC3FKPP01BUVA8 TruePair 3967 Contig18.13 24188 - Contig18.13 20221 +
FC3FKPP01BTJNG TruePair 3820 Contig23.11 44530 + Contig23.11 48350 -
FC3FKPP01AFDGW TruePair 2498 Contig41.19 5103 - Contig41.19 2605 +
FC3FKPP01CEJ64 TruePair 2498 Contig41.19 5103 - Contig41.19 2605 +
FC3FKPP01E3K7I TruePair 3735 Contig117.3 98721 - Contig117.3 94986 +
FC3FKPP01A7AS8 FalsePair - Contig129.5 2438 + Contig129.6 1986 - 276 1986
FC3FKPP01BJX6Q FalsePair - Contig133.3 28972 + Contig140.4 1428 - 5803 1428
FC3FKPP01AWH9A TruePair 2636 Contig40.10 122575 + Contig40.10 125211 -
FC3FKPP01DCED4 MultiplyMapped - Contig18.9 1244 - Repeat
Guohui Yao