Description: A manually curated arsenic functional gene
database (AsgeneDB) and R package (Asgene package) are developed for
rapid and accurate metagenomic analysis.
Authors: Xinwei
Song, Yongguan Zhu, Yongming Luo, Jianming Xu, Bin Ma*
Arsenic (As) is a kind of toxic metal-like element widely distributed in the world. To understand the microbial community of arsenic metabolism in the environment, we developed a curated arsenic functional gene database (AsgeneDB) covering five arsenic metabolic pathways (transport, respiratory, reduction, oxidative and methylation processes), 59 arsenic biotransformation functional gene families and 414773 representative sequences. Here, protein sequences for As gene families were recruited from multiple public databases such as UniProt, NCBI RefSeq, KEGG, COG, eggNOG, arCOG and KOG. AsgeneDB covers 46 phyla and 1653 genera of bacterial, archaea and fungi. It can quickly analyze the arsenic metabolism and transformation function of microbial communities by integrating multiple lineal homology databases with high specificity, comprehensiveness, representativeness and accuracy. AsgeneDB and the associated R Package will greatly promote the study of arsenic metabolism in microbial communities in various environments.
AsgeneDB.fa: Fasta format representative sequences obtained by clustering curated sequences at 100% sequence identity. This file can be used for “BLAST” searching arsenic genes in shotgun metagenomes.
asgene.map: A mapping file that maps sequence IDs to gene names, only sequences belonging to arsenic gene families are included. This file is used to generate arsenic gene profiles from BLAST-like results against the database.
id_gene_tax_pathway_total.csv: Species table of sequences in
AsgeneDB.
Columns included:
length.txt: The file contains the length of amino acid sequences in AsgeneDB for standardizing arsenic gene abundance statistics.
You can install the development version of Asgene from GitHub with:
install.packages("devtools")
devtools::install_github("XinweiSong/Asgene")
Description:
we provide Asgene Package for metagenomic alignment
(nucleic acid or protein sequence), subsequent gene family abundance
statistics and sample abundance standardization. The database files user
needs are built into the Asgene. Therefore, users only need to choose a
database search tool according to their needs (e.g., USEARCH, BLAST and
DIAMOND) and input three parameters (e.g., working path, search
parameters of tool and filetype) to automatically analyze statistics and
output statistical results. Users can select gene abundance statistics
(Option: abundance) to normalize read counts per kilobase per million
reads (RPKM) to eliminate differences in sequencing depth and reference
sequence length between samples. In addition, if the user selects
functional species statistics (Option: taxonomy), the driveing species
of each arsenic metabolism gene at different classification levels in
the sample can be generated automatically.
This is a basic example which shows you how to use the package:
library(Asgene)
#Arsenic metabolism gene abundance analysis
Asgene(analysis = "abundance", workdir = "./", method = "diamond", toolpath = "./", search_parameters = "-e 1e-4 -p 28 --query-cover 80 --id 50",seqtype = "nucl", filetype = "fasta", PE = TRUE , output = "./")
#Arsenic metabolism taxonomy analysis
Asgene(analysis = "taxonomy", workdir = "
./", method = "diamond", toolpath = "./", search_parameters = "-e 1e-4 -p 28 --query-cover 80 --id 50",seqtype = "nucl", filetype = "fasta",PE = TRUE, output = "./")
#Example datasets using
Asgene(analysis = "abundance", workdir = "./", method = "diamond", toolpath = "./", search_parameters = "-e 1e-4 -p 28 --query-cover 80 --id 50",seqtype = "prot", output = "./", test.data = TRUE)
Asgene(analysis = "taxonomy", workdir = "./", method = "diamond", toolpath = "./", search_parameters = "-e 1e-4 -p 28 --query-cover 80 --id 50",seqtype = "prot", output = "./", test.data = TRUE)