This program integrates the association of cis- and trans-components of gene expression with the phenotype or expression traits (e-traits), to prioritize causal genes and construct regulatory networks.
CisTrans-ECAS is mainly tested in R-3.5.1. It depends on the following software:
It also depends on the following R packages:
This script constructs prediction models for gene expression based on SNPs in the vicinity of genes, depending on GCTA.
$ Rscript gcta_cis.R [options]
--gffs_file=value An RData file containing gene annotation information, including a data frame named 'gffs' with columns Gene, chr, start, end, strand, Qsymbols, and Note.
--exp_file=value An RData file containing a gene expression matrix with a matrix named 'exp_m', where row names represent gene IDs and column names represent individual IDs.
--genodir=value Directory containing genotype data in plink's binary format, including .bed, .bim, and .fam files, saved per chromosome.
--gfile_prefix=value Pattern for the file names of genotype data, where chromosome number is represented by %s, exclusive of suffix.
--plinkdir=value Path to the plink program.
--gctadir=value Path to the GCTA program.
--out_dir=value Output directory.
--extend=value The range of cis-regulatory regions around genes. SNPs within a distance less than 'extend' from genes are used to build gene expression prediction models. e.g., extend=100000.
--ncor=value The number of threads to use.
--help Display this help message.
$ cd ~/test/
$ Rscript gcta_cis.R --gffs_file=./test_data/gffs.RData --exp_file=./test_data/exp_matrix.RData --genodir=./test_data/ --gfile_prefix=%s_529_test --plinkdir=plink --gctadir=gcta --out_dir=./test_res/gcta_cis --extend=1e5 --ncor=2
This script is used to merge the results generated by gcta_cis.R for different chromosomes.
$ Rscript merge_cis_res.R [options]
--file_dir=value Directory of output results for gcta_cis.R --help Display this help message.
$ Rscript merge_cis_res.R --file_dir=./test_res/gcta_cis
This script is used to separately associate phenotypes with the cis- and trans-components of gene expression to identify candidate genes that influence the phenotype.
$ Rscript cistrans_twas.R [options]
--gffs_file=value An RData file containing gene annotation information, including a data frame named 'gffs' with columns Gene, chr, start, end, strand, Qsymbols, and Note.
--K_file=value An RData file containing kinship matrix, including a matrix named 'K'.
--pheno_file=value An RData file containing phenotype data, including a data frame named 'pheno' with row names as individual IDs and column names as traits.
--vc_file=value An RData file generated by 'merge_cis_res.R' that includes the variance explained by cis-genetic variants for each gene and the corresponding p-value.
--cis_file=value An RData file generated by 'merge_cis_res.R' that includes a matrix of the cis-component of gene expression, named 'res_cis_m', with genes as rows and individuals as columns.
--trans_file=value An RData file generated by 'merge_cis_res.R' that includes a matrix of the trans-component of gene expression, named 'res_trans_m', with genes as rows and individuals as columns.
--out_dir=value Output directory.
--p_threshold=value P-value threshold for cis-genetic variance.
--help Display this help message.
$ Rscript cistrans_twas.R --gffs_file=./test_data/gffs.RData --K_file=./test_data/Kinship.RData --pheno_file=./test_data/pheno.RData --vc_file=./test_res/gcta_cis/gcta_cis_vc.RData --cis_file=./test_res/gcta_cis/gcta_cis_cis.RData --trans_file=./test_res/gcta_cis/gcta_cis_trans.RData --out_dir=./test_res/cistrans_twas --p_threshold=1.62e-6
This script is used to separately associate e-traits (gene expression levels) with the cis- and trans-components of gene expression to identify upstream regulatory factors that influence e-trait expression.
$ Rscript cistrans_etwas.R [options]
--gffs_file=value An RData file containing gene annotation information, including a data frame named 'gffs' with columns Gene, chr, start, end, strand, Qsymbols, and Note.
--K_file=value An RData file containing kinship matrix, including a matrix named 'K'.
--exp_file=value An RData file containing a gene expression matrix with a matrix named 'exp_m', where row names represent gene IDs and column names represent individual IDs.
--gene_file=value An RData file, which contains a vector named 'genes,' including e-traits used for cistrans_etwas.
--vc_file=value An RData file generated by 'merge_cis_res.R' that includes the variance explained by cis-genetic variants for each gene and the corresponding p-value.
--cis_file=value An RData file generated by 'merge_cis_res.R' that includes a matrix of the cis-component of gene expression, named 'res_cis_m', with genes as rows and individuals as columns.
--trans_file=value An RData file generated by 'merge_cis_res.R' that includes a matrix of the trans-component of gene expression, named 'res_trans_m', with genes as rows and individuals as columns.
--out_dir=value Output directory.
--p_threshold=value P-value threshold for cis-genetic variance.
--help Display this help message.
$ Rscript cistrans_etwas.R --gffs_file=./test_data/gffs.RData --K_file=./test_data/Kinship.RData --exp_file=./test_data/exp_matrix.RData --gene_file=./test_data/genes.RData --vc_file=./test_res/gcta_cis/gcta_cis_vc.RData --cis_file=./test_res/gcta_cis/gcta_cis_cis.RData --trans_file=./test_res/gcta_cis/gcta_cis_trans.RData --out_dir=./test_res/cistrans_etwas --p_threshold=1.62e-6