Transcription Factor Target Finder (TFTF) is an R package designed for predicting transcription factor target genes and predicting upstream transcription factors of target genes. There are currently many online tools for predicting transcription factor target genes, such as hTFTarget, KnockTF, CHEA, TRRUST, GTRD, and ChIP Atlas, which are based on Chip-seq high-throughput data, transcriptional profiling data from knockdown/knockout experiments, or motif sequences. Our aim is to fully utilize the prediction results from these online tools and combine them with correlation analysis to maximize the reliability of the predicted transcription factor-target gene regulatory relationships.
Dataset | link | description |
---|---|---|
hTFtarget | https://guolab.wchscu.cn/hTFtarget/ | This database aggregates data from various transcription factor regulation studies, including experimental evidence and prediction results of transcription factor-target gene relationships. |
KnockTF | https://bio.liclab.net/KnockTF/index.php | KnockTF is based on data from transcription factor knockout experiments, providing information on the regulation of gene expression by transcription factors, aiding in understanding transcription factor functions and networks. |
ENCODE | https://www.encodeproject.org/ | The ENCODE project provides extensive genomic functional annotation data, including transcription factor binding sites, chromatin accessibility regions, and gene regulatory elements. |
CHEA | https://maayanlab.cloud/chea3/ | The CHEA database offers transcription factor-target gene relationships based on shared promoter sequences, supporting enrichment analysis and construction of transcriptional regulatory networks. |
TRRUST | https://www.grnpedia.org/trrust/ | TRRUST curates a large amount of transcription factor-target gene interaction data from literature, providing reliable information on transcription factor regulation relationships. |
GTRD | https://gtrd.biouml.org/ | GTRD collects transcription factor binding site data from ChIP-seq experiments, providing information on transcription factor binding in the genome. |
ChIP_Atlas | https://chip-atlas.org/ | ChIP-Atlas integrates a large amount of publicly available ChIP-seq data, providing information on transcription factor binding sites and regulation in different cells and tissues. |
JASPAR | https://jaspar.genereg.net | JASPAR is an authoritative transcription factor binding site database, providing binding sequence data for various species, used for predicting transcription factor DNA binding preferences and regulatory effects. |
TCGA | https://portal.gdc.cancer.gov/ | The Cancer Genome Atlas (TCGA) is a global collaborative project aimed at advancing cancer research and diagnosis by analyzing genomic, epigenomic, and clinical data from various cancer samples. |
GTEx | https://www.gtexportal.org/ | The Genotype-Tissue Expression (GTEx) project is a comprehensive research initiative aimed at exploring patterns and variations in gene expression across various human tissues to enhance our understanding of gene function. |
CCLE | https://sites.broadinstitute.org/ccle/ | The Cancer Cell Line Encyclopedia (CCLE) is a rich resource database providing molecular characterization data from various cancer cell lines, serving as a valuable reference and resource for cancer research and drug development. |
FIMO (Find Individual Motif Occurrences) is a tool used to identify individual motif occurrences within DNA sequences, enabling the recognition of transcription factor binding sites and other functional DNA elements.
PWMEnrich (Position Weight Matrix Enrichment Analysis) is an R package for analyzing the enrichment of features such as promoter regions and transcription factor binding sites in genomic sequence data, utilizing a Position Weight Matrix (PWM) model for the identification and quantification of transcription factor binding sites.
devtools::install_github("WangJin93/TFTF")
View the list of transcription factors included in this R package and their coverage across all datasets. We only include transcription factors that are present in at least 2 datasets out of nine, resulting in a total of 1575 transcription factors.
tf_list
# View correlation analysis organization types
tissue
destination description Predict the target genes of Transcription Factor in multiple TF-target prediction databases and correlation analysis.
predict_target(
datasets = c("hTFtarget", "KnockTF", "FIMO_JASPAR", "PWMEnrich_JASPAR", "ENCODE",
"CHEA", "TRRUST", "GTRD", "ChIP_Atlas"),
tf = "STAT3",
TCGA_tissue = "COAD",
GTEx_tissue = "Colon",
cor_DB = c("TCGA", "GTEx"),
cor_cutoff = 0.3,
FIMO.score = 10,
PWMEnrich.p = 0.1,
cut.log2FC = 1,
down.only = T,
app = F
)
Predict the upstream Transcription Factors regulating user inputted gene in multiple TF-target prediction databases and correlation analysis.
predict_TF(
datasets = c("hTFtarget", "KnockTF", "FIMO_JASPAR", "PWMEnrich_JASPAR", "ENCODE",
"CHEA", "TRRUST", "GTRD", "ChIP_Atlas"),
target = "GAPDH",
TCGA_tissue = "COAD",
GTEx_tissue = "Colon",
cor_DB = c("TCGA", "GTEx"),
cor_cutoff = 0.3,
FIMO.score = 10,
PWMEnrich.p = 0.1,
cut.log2FC = 1,
down.only = T,
app = F
)
Intersection analysis and visualization of prediction results
Results <-predict_target(datasets=c("hTFtarget","KnockTF","FIMO_JASPAR",
"PWMEnrich_JASPAR"),
cor_DB = c("TCGA","GTEx"),
tf = "STAT3")
results_inter <- intersections(results)
plot_venn(results_inter)
Correlation analysis between TF and target gene in pan-tissue in "TCGA", "GTEx" or "CCLE" databases.
cor_results <- pantissue_cor_analysis(
Gene1 = "FOXM1",
Gene2 = "GAPDH",
data_source = "TCGA",
type = c("normal", "tumor"),
cor_method = "pearson"
)
Visualization of pan-tissue correlation analysis using ggplot2.
viz_cor_results(cor_results,
values = c("black","red"))
The R package has a built-in app that can implement all the functions of the R package, visualize the operation interface, and require the necessary packages to be installed in order to run (shiny, DT, graph, ggraph, shinyWidgets, bs4Dash, tidygraph).
TFTF_app()
#In addition, you can also access the app through the following link: https://jingle.shinyapps.io/TF_Target_Finder/。
This systematic approach enables a thorough analysis of TF-target gene interactions, bolstering the robustness of predictions by harnessing the combined strength of multiple dataset intersections. The procedural steps are as follows:
This integrated approach, combining gene expression correlation analysis with multi-dataset intersection, was designed to ensure a comprehensive and reliable prediction of TF-target gene interactions. The operational steps are detailed below:
In this module, we utilized data from three publicly available databases to analyze the expression correlation of TF-target pairs across various tissue types. The integration of these analyses enables a comprehensive assessment of the expression relationship between the TFs and their potential target genes in a context-specific manner. The methodological steps are detailed as follows:
The module was designed to predict the target genes of transcription factors (TFs) of interest based on gene differential expression analysis results uploaded by the user, utilizing multiple TF prediction databases, and to visualize the regulatory network. This module thus facilitates the elucidation of potential regulatory relationships by integrating user data with established TF prediction resources, supporting the discovery of novel insights into gene regulatory networks. The steps for utilizing this module are as follows:
https://github.com/WangJin93/TFTF
Wang J. TFTF: An R-Based Integrative Tool for Decoding Human Transcription Factor–Target Interactions. Biomolecules. 2024; 14(7):749. https://doi.org/10.3390/biom14070749
Email: jin.wang93@outlook.com