ghyao / GAA

GAA is designed to take advantages of different assembly algorithms or sequencing platforms to improve the quality of next-generation sequence (NGS) assemblies.
3 stars 1 forks source link

NAME

GAA - Graph Accordance of Assemly

VERSION

Version 1.0

LICENCE

Copyright 2011 - Guohui Yao - Washington University in St. Louis.

gyao at wustl dot edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

INTRODUCTION

No individual assembly algorithm addresses all the known limitations of assembling short length sequences. Overall reduced sequence contig length is the major problem that challenges the usage of these assemblies. GAA is designed to take advantages of different assembly algorithms or sequencing platforms to improve the quality of next-generation sequence (NGS) assemblies.

PREREQUISITES

Perl => 5.8.9 Sequence aligners: BLAT GSMapper

INSTALLATION

The program is tested under LINUX systems. Below command could install the GAA package. export PERL5LIB=$PERL5LIB:path_to_the_gaa_package

COMMAND LINE

Run gaa.pl without any parameter, it will print below usage on the screen.

-.pl --target|-t : target consensus file, the contig names should be in the format of >Contig1.1 --query|-q : query consensus file, the contig names should be in the format of >Contig1.1

The following options are optional. --target-qual|-T : target quality file --query-qual|-Q : query quality file --target_gap|-g : e.g. gap.txt for Newbler --query_gap|-G : e.g. gap.txt for Newbler --match|-m : blat target.fa query.fa -fastMap query_vs_target.psl

--pair_status_target|r  <454PairStatus.txt>  : generated from GSMapper
--pair_status_query|R   <454PairStatus.txt>  : generated from GSMapper
    --ce                    <ce.psl>             : ce status psl file

    --output-dir|-o     <output_dir=pwd>     : default is merged_assembly under current directory
    --tip-size|-p       <tip_size=90>        : tip tolerance 
    --score-cutoff-lower|-l <score_cutoff_lower=30>  : confident unique contig, if score < 30
    --score-cutoff-upper|-u <score-cutoff_upper=100> : confident match,         if score > 100
    --contig-size-cutoff|-s <contig_size_cutoff=100> : keep unique contigs of length > 100
Using -m option would shortcut the alignment step if alignment.psl alread exists.

EXAMPLES

$ ./gaa.pl -t newbler.fa -q cabog.fa Merge newbler.fa and cabog.fa, and output newbler.fa_n_cabog.fa as the merged assembly to:

$ ./gaa.pl -t newbler.fa -q cabog.fa -T newbler.qual -Q cabog.qual Also generate a quality file (newbler.qual_n_cabog.qual) for the merged assembly.

$ ./gaa.pl -t newbler.fa -q cabog.fa -t newbler.gap Generate the gap file (g.gap) for merged assembly.

$ head target.fa

Contig0.1 AAGgAAGCAAATtAtAAAaCtAaaGCATTAAAGATATACACATAAAAATGAAGTACTAAA TTTATCAAGACAAAAaGCCCTGTCCAAATTGTACTATTGTCCAACTTGTACCACTTTACC Contig0.2 CCCTAAATTGAGAAAAaTAACTTTAGGGAATATATCAATGTACAGTGCCCGCTTCGTTAT TCGGGAGAGAAATTCGAAAAAGTTAAAATAACATTCCGGAAATGTTAATTTTACCCTGCA Contig0.3 GTATTGATCCAAAATCGGTGTAAATATTACCCTTTTTAGGTGTATTAGGTGTTAAATGTA TCCTTTTTCATGTTAATTTTACCCTTAATAAGGTGTAAAATTAACATTAAAAAATGTTGA TATATTTTTACACCTAAAAAGTGTTAAATTTCTGAGGAAAAATGTTAATCGCACCCTCTT

$ head target.qual

Contig0.1 64 64 64 26 64 64 64 64 64 64 50 64 26 64 16 64 64 64 22 64 39 64 39 39 64 64 55 64 64 64 64 64 64 64 64 64 64 64 64 48 64 64 64 64 64 64 64 49 45 64 64 64 64 64 64 64 64 64 64 32 64 64 64 64 64 64

$ head gap.txt Contig0.1 20 Contig0.2 811 Contig0.3 932 Contig0.4 876 Contig0.5 1927 Contig0.6 1730 Contig0.7 20 Contig1.1 171 Contig1.2 402 Contig1.3 1382

$ head 454PairStatus.txt Template Status Distance Left Accno Left Pos Left Dir Right Accno Right Pos Right Dir Left Distance Right Distance FC3FKPP01BUVA8 TruePair 3967 Contig18.13 24188 - Contig18.13 20221 +
FC3FKPP01BTJNG TruePair 3820 Contig23.11 44530 + Contig23.11 48350 -
FC3FKPP01AFDGW TruePair 2498 Contig41.19 5103 - Contig41.19 2605 +
FC3FKPP01CEJ64 TruePair 2498 Contig41.19 5103 - Contig41.19 2605 +
FC3FKPP01E3K7I TruePair 3735 Contig117.3 98721 - Contig117.3 94986 +
FC3FKPP01A7AS8 FalsePair - Contig129.5 2438 + Contig129.6 1986 - 276 1986 FC3FKPP01BJX6Q FalsePair - Contig133.3 28972 + Contig140.4 1428 - 5803 1428 FC3FKPP01AWH9A TruePair 2636 Contig40.10 122575 + Contig40.10 125211 -
FC3FKPP01DCED4 MultiplyMapped - Contig18.9 1244 - Repeat

CONTACT

Guohui Yao