WangJin93 / TFTF

Transcription Factor Target Finder (TFTF) is an R package designed for predicting transcription factor target genes and predicting upstream transcription factors of target genes.
https://jingle.shinyapps.io/TF_Target_Finder/
MIT License
5 stars 0 forks source link

https://www.jingege.wang/tf-target-finder-an-r-web-application-and-package-bridging-multiple-predictive-models-for-decoding-transcription-fac-tor-target-interactions/

Transcription Factor Target Finder (TFTF) is an R package designed for predicting transcription factor target genes and predicting upstream transcription factors of target genes. There are currently many online tools for predicting transcription factor target genes, such as hTFTarget, KnockTF, CHEA, TRRUST, GTRD, and ChIP Atlas, which are based on Chip-seq high-throughput data, transcriptional profiling data from knockdown/knockout experiments, or motif sequences. Our aim is to fully utilize the prediction results from these online tools and combine them with correlation analysis to maximize the reliability of the predicted transcription factor-target gene regulatory relationships.

1、 Related online tools and database information

Dataset link description
hTFtarget https://guolab.wchscu.cn/hTFtarget/ This database aggregates data from various transcription factor regulation studies, including experimental evidence and prediction results of transcription factor-target gene relationships.
KnockTF https://bio.liclab.net/KnockTF/index.php KnockTF is based on data from transcription factor knockout experiments, providing information on the regulation of gene expression by transcription factors, aiding in understanding transcription factor functions and networks.
ENCODE https://www.encodeproject.org/ The ENCODE project provides extensive genomic functional annotation data, including transcription factor binding sites, chromatin accessibility regions, and gene regulatory elements.
CHEA https://maayanlab.cloud/chea3/ The CHEA database offers transcription factor-target gene relationships based on shared promoter sequences, supporting enrichment analysis and construction of transcriptional regulatory networks.
TRRUST https://www.grnpedia.org/trrust/ TRRUST curates a large amount of transcription factor-target gene interaction data from literature, providing reliable information on transcription factor regulation relationships.
GTRD https://gtrd.biouml.org/ GTRD collects transcription factor binding site data from ChIP-seq experiments, providing information on transcription factor binding in the genome.
ChIP_Atlas https://chip-atlas.org/ ChIP-Atlas integrates a large amount of publicly available ChIP-seq data, providing information on transcription factor binding sites and regulation in different cells and tissues.
JASPAR https://jaspar.genereg.net JASPAR is an authoritative transcription factor binding site database, providing binding sequence data for various species, used for predicting transcription factor DNA binding preferences and regulatory effects.
TCGA https://portal.gdc.cancer.gov/ The Cancer Genome Atlas (TCGA) is a global collaborative project aimed at advancing cancer research and diagnosis by analyzing genomic, epigenomic, and clinical data from various cancer samples.
GTEx https://www.gtexportal.org/ The Genotype-Tissue Expression (GTEx) project is a comprehensive research initiative aimed at exploring patterns and variations in gene expression across various human tissues to enhance our understanding of gene function.
CCLE https://sites.broadinstitute.org/ccle/ The Cancer Cell Line Encyclopedia (CCLE) is a rich resource database providing molecular characterization data from various cancer cell lines, serving as a valuable reference and resource for cancer research and drug development.

2、 Data source

Dataset link
hTFtarget https://guolab.wchscu.cn/hTFtarget/api/chipseq/targets/tf
KnockTF https://bio.liclab.net/KnockTFv2/public/download_anno/knocktf_v2_main_human.txt
ENCODE https://maayanlab.cloud/static/hdfs/harmonizome/data/encodetfppi/gene_attribute_edges.txt.gz
CHEA https://maayanlab.cloud/static/hdfs/harmonizome/data/cheappi/gene_attribute_edges.txt.gz
TRRUST https://www.grnpedia.org/trrust/data/trrust_rawdata.human.tsv
GTRD https://gtrd20-06.biouml.org/bioumlweb/#
ChIP_Atlas https://chip-atlas.dbcls.jp/data/hg38/target/(TF name).tsv
JASPAR motif https://jaspar.elixir.no/downloads/\#pfm_vertebrates
TCGA https://toil-xena-hub.s3.us-east-1.amazonaws.com/download/tcga_rsem_isoform_tpm.gz
GTEx https://toil-xena-hub.s3.us-east-1.amazonaws.com/download/gtex_rsem_isoform_tpm.gz
CCLE https://data.broadinstitute.org/ccle/CCLE_DepMap_18Q2_RNAseq_RPKM_20180502.gct

3、 Algorithm for predicting transcription factor target genes based on motif

3.1 FIMO

FIMO (Find Individual Motif Occurrences) is a tool used to identify individual motif occurrences within DNA sequences, enabling the recognition of transcription factor binding sites and other functional DNA elements.

3.2 PWMEnrich

PWMEnrich (Position Weight Matrix Enrichment Analysis) is an R package for analyzing the enrichment of features such as promoter regions and transcription factor binding sites in genomic sequence data, utilizing a Position Weight Matrix (PWM) model for the identification and quantification of transcription factor binding sites.

4、 R package installation and basic function introduction

4.1 Installing TFTF from GitHub

devtools::install_github("WangJin93/TFTF")

4.2 Introduction to Data and Basic Functions

View the list of transcription factors included in this R package and their coverage across all datasets. We only include transcription factors that are present in at least 2 datasets out of nine, resulting in a total of 1575 transcription factors.

tf_list

# View correlation analysis organization types

tissue

destination description Predict the target genes of Transcription Factor in multiple TF-target prediction databases and correlation analysis.

predict_target(
        datasets = c("hTFtarget", "KnockTF", "FIMO_JASPAR", "PWMEnrich_JASPAR", "ENCODE",
        "CHEA", "TRRUST", "GTRD", "ChIP_Atlas"),
        tf = "STAT3",
        TCGA_tissue = "COAD",
        GTEx_tissue = "Colon",
        cor_DB = c("TCGA", "GTEx"),
        cor_cutoff = 0.3,
        FIMO.score = 10,
        PWMEnrich.p = 0.1,
        cut.log2FC = 1,
        down.only = T,
        app = F
)

Predict the upstream Transcription Factors regulating user inputted gene in multiple TF-target prediction databases and correlation analysis.

predict_TF(
        datasets = c("hTFtarget", "KnockTF", "FIMO_JASPAR", "PWMEnrich_JASPAR", "ENCODE",
        "CHEA", "TRRUST", "GTRD", "ChIP_Atlas"),
        target = "GAPDH",
        TCGA_tissue = "COAD",
        GTEx_tissue = "Colon",
        cor_DB = c("TCGA", "GTEx"),
        cor_cutoff = 0.3,
        FIMO.score = 10,
        PWMEnrich.p = 0.1,
        cut.log2FC = 1,
        down.only = T,
        app = F
)

Intersection analysis and visualization of prediction results

Results <-predict_target(datasets=c("hTFtarget","KnockTF","FIMO_JASPAR",
        "PWMEnrich_JASPAR"),
        cor_DB = c("TCGA","GTEx"),
        tf = "STAT3")
results_inter <- intersections(results)
plot_venn(results_inter)

image

Correlation analysis between TF and target gene in pan-tissue in "TCGA", "GTEx" or "CCLE" databases.

cor_results <- pantissue_cor_analysis(
        Gene1 = "FOXM1",
        Gene2 = "GAPDH",
        data_source = "TCGA",
        type = c("normal", "tumor"),
        cor_method = "pearson"
)

image

Visualization of pan-tissue correlation analysis using ggplot2.

viz_cor_results(cor_results,
values = c("black","red"))

image

5、 Introduction to the operation of Shiny APP visualization interface

The R package has a built-in app that can implement all the functions of the R package, visualize the operation interface, and require the necessary packages to be installed in order to run (shiny, DT, graph, ggraph, shinyWidgets, bs4Dash, tidygraph).

TFTF_app()

#In addition, you can also access the app through the following link: https://jingle.shinyapps.io/TF_Target_Finder/

5.1 Module 1: Procedures for the prediction of the target genes of TF

This systematic approach enables a thorough analysis of TF-target gene interactions, bolstering the robustness of predictions by harnessing the combined strength of multiple dataset intersections. The procedural steps are as follows:

image

5.2 Module 2: Procedures for the prediction of upstream TFs of target genes

This integrated approach, combining gene expression correlation analysis with multi-dataset intersection, was designed to ensure a comprehensive and reliable prediction of TF-target gene interactions. The operational steps are detailed below:

image

5.3 Module 3: Pan-tissue correlation analysis between the expression of predicted TF-target pair

In this module, we utilized data from three publicly available databases to analyze the expression correlation of TF-target pairs across various tissue types. The integration of these analyses enables a comprehensive assessment of the expression relationship between the TFs and their potential target genes in a context-specific manner. The methodological steps are detailed as follows:

image

5.4 Module 4: TF-targets regulation network analysis

The module was designed to predict the target genes of transcription factors (TFs) of interest based on gene differential expression analysis results uploaded by the user, utilizing multiple TF prediction databases, and to visualize the regulatory network. This module thus facilitates the elucidation of potential regulatory relationships by integrating user data with established TF prediction resources, supporting the discovery of novel insights into gene regulatory networks. The steps for utilizing this module are as follows:

image

6、 Source code

https://github.com/WangJin93/TFTF

7、 Citation

Wang J. TFTF: An R-Based Integrative Tool for Decoding Human Transcription Factor–Target Interactions. Biomolecules. 2024; 14(7):749. https://doi.org/10.3390/biom14070749

8、 Feedback and suggestions

Email: jin.wang93@outlook.com