Goalign is a set of command line tools to manipulate multiple alignments. It is implemented in Go language.
Goalign aims to handle multiple alignments in Phylip, Fasta, Nexus, and Clustal formats, through several basic commands. Each command may print result (an alignment for example) in the standard output, and thus can be piped to the standard input of the next goalign command.
Input files may be local or remote files:
http(s)://<URL>
, the file is download from the given URL.Gzipped input files (.gz
extension) are supported, as well as XZ files (.xz
extension) and BZipped files (.bz[2]
extension).
Note:
TO manipulate phylogenetic trees, See also Gotree.
If you use Gotree or Goalign, please cite:
Frédéric Lemoine, Olivier Gascuel
Gotree/Goalign: toolkit and Go API to facilitate the development of phylogenetic workflows,
NAR Genomics and Bioinformatics, Volume 3, Issue 3, September 2021, lqab075, doi
You can download ready to run binaries for the latest release in the release section. Binaries are available for MacOS, Linux, and Windows (32 and 64 bits).
Once downloaded, you can just run the executable without any other downloads.
Goalign Docker image is accessible from docker hub. You may use it as following:
# Display goalign help
docker run -v $PWD:$PWD -w $PWD -i -t evolbioinfo/goalign:v0.2.6 -h
Goalign docker image is usable from singularity . You may use it as following:
# Pull image from docker hub
singularity pull docker://evolbioinfo/goalign:v0.2.6
# Display goalign help
./goalign-v0.2.6.simg -h
Goalign is also available on bioconda. Just type:
conda install -c bioconda goalign
To build goalign, you must first download and install Go on your system ($1.21.6$).
Then you just have to type :
git clone git@github.com:evolbioinfo/goalign.git
cd goalign
make && make install
# or go get . && go build .
# or go get . && go install .
The goalign
executable should be located in the current folder (or the $GOPATH/bin
).
To test the executable:
./test.sh
goalign uses cobra, and therefore proposes a command to generate auto completion scripts:
gotree completion -h
You may go to the doc for a more detailed documentation of the commands.
goalign random | goalign stats
goalign random > align.fa
goalign trim name -n 3 -m map -i align.fa > align_rename.fa
goalign rename -i align_rename.fa -m map -r
goalign random | goalign reformat phylip
goalign random --amino-acids --clustal --nb-seqs 2 | goalign reformat fasta --clustal
goalign random -p | goalign reformat fasta -p
goalign random | goalign addid -n "Dataset1_"
goalign random | goalign addid -r -n "_Dataset1"
goalign random -n 10000 | goalign sample -n 10
goalign subset -e '^mammal.*$' -i align.fasta
goalign subset -r -e '^mammal.*$' -i align.fasta
goalign subseq -i align.fasta -s 9 -l 10
goalign compute pssm -n 4 -i align.fasta
goalign compute distance -m k2p -i align.fasta -t 5
goalign compute entropy -i align.fasta
goalign random -n 500 | goalign build seqboot -S -n 100 --gz --tar -t 5 -o boot
goalign random -n 500 | goalign build seqboot -S -n 100 --gz -t 5 -o boot
package main
import ( "fmt" "os"
"github.com/evolbioinfo/goalign/align"
"github.com/evolbioinfo/goalign/io/fasta"
"github.com/evolbioinfo/goalign/io/phylip"
)
func main() { var err error var f *os.File var align align.Alignment
f, err = os.Open("f.phy")
if err != nil {
panic(err)
}
if align, err = phylip.NewParser(f).Parse(); err != nil {
panic(err)
} else {
fmt.Println(fasta.WriteSequences(align))
}
}
* Parse a Phylip multi alignments file and export it in Fasta
```go
package main
import (
"fmt"
"os"
"github.com/evolbioinfo/goalign/align"
"github.com/evolbioinfo/goalign/io/fasta"
"github.com/evolbioinfo/goalign/io/phylip"
)
func main() {
var f *os.File
var aligns chan align.Alignment
var err error
f, err = os.Open("f.phy")
if err != nil {
panic(err)
}
aligns = make(chan align.Alignment, 15)
if err = phylip.NewParser(f).ParseMultiple(aligns); err != nil {
panic(err)
} else {
for al := range aligns {
fmt.Println(fasta.WriteSequences(al))
}
}
}
package main
import ( "fmt" "os"
"github.com/evolbioinfo/goalign/align"
"github.com/evolbioinfo/goalign/io/fasta"
"github.com/evolbioinfo/goalign/io/nexus"
)
func main() { var f *os.File var align align.Alignment var err error
f, err = os.Open("f.fasta")
if err != nil {
panic(err)
}
if align, err = fasta.NewParser(f).Parse(); err != nil {
panic(err)
} else {
fmt.Println(nexus.WriteAlignment(align))
}
}
* Parse a Fasta file and export it in Phylip
```go
package main
import (
"fmt"
"os"
"github.com/evolbioinfo/goalign/align"
"github.com/evolbioinfo/goalign/io/fasta"
"github.com/evolbioinfo/goalign/io/phylip"
)
func main() {
var f *os.File
var align align.Alignment
var err error
f, err = os.Open("f.fasta")
if err != nil {
panic(err)
}
if align, err = fasta.NewParser(f).Parse(); err != nil {
panic(err)
} else {
fmt.Println(phylip.WriteAlignment(align, false))
}
}
Iterating over alignment sequences
align.IterateChar(func(name string, sequence []uint8) {
fmt.Printf("Sequence: %s\n", name)
})
Append identifier at the beginning of all sequence names
align.AppendSeqIdentifier("IDENT", false)
Alignment statistics
var n int = align.NbSequences()
var l int = align.Length()
Extract a sub alignment
var subalign align.Alignment
var err error
subalign,err = align.SubAlign(0, 100)
Sort sequences by alphanumerical order
align.Sort()
Copy/Clone the alignment
var clonealign align.Alignment
var err error
clonealign,err = align.Clone()
Get the sequence having a specific name
var sequence string
var err error
sequence,err = align.GetSequence("nameofsequence")
Build a bootstrap replicate
var bootstrap align.Alignment
bootstrap = align.BuildBootstrap()
Randomly shuffle sequence order of alignment
align.ShuffleSequences()
Compute evolutionary ditance matrix (5 threads)
import "github.com/evolbioinfo/goalign/distance"
//...
var model distance.DistModel
var distMatrix [][]float64
model = distance.Model("k2p", false)
distmatrix = distance.DistMatrix(align, nil, model, 5)
Other functions
Other functions are described in the godoc.