Bacmethy, a customizable pipeline based on Docker, is designed to calculate the enrichment significance of methylated and un(der)methylated motifs in the regulatory and coding regions of the genome. It also identifies genes that are co-affected by DNA methylation and transcription factor binding, enabling the prediction of transcriptional regulation effects by DNA methylation. By using Bacmethy, researchers can gain a more comprehensive understanding of bacterial epigenomes.
We offer three ways for utilizing Bacmethy: a web server, a Docker-based system, and a command-line tool.
Please cite our work by "Liu, Ji-Hong, Yizhou Zhang, Ning Zhou, Jiale He, Jing Xu, Zhao Cai, Liang Yang, and Yang Liu. 2024. “ Bacmethy: A Novel and Convenient Tool for Investigating Bacterial DNA Methylation Pattern and Their Transcriptional Regulation Effects.” iMeta e186. https://doi.org/10.1002/imt2.186."
Bacmethy Website
The website includes methylation analysis and TFs binding prediction modules of Bacmethy.
No installation step is needed if using Bacmethy web server.
Docker image is provided for Windows or Mac users.
Make sure you have docker installed.
Use the command to get Bacmethy in docker:
docker pull liujihong/Bacmethy:2.0
docker run -t -i liujihong/Bacmethy:2.0 /bin/bash
re-enter docker.
docker start
wget -c https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
bash Miniconda2-latest-Linux-x86_64.sh
conda -n Bacmethy
conda activate Bacmethy
conda install -c bioconda prokka
conda install -c bioconda bedtools
conda install -c bioconda meme
conda install conda-forge::r-base
conda install git git clone https://github.com/LiuJih2021/Bacmethy.git cd ./Bacmethy/script/
########
genome.fa: The referece genome sequence file (fasta file for the complete genome sequence). Note: for step 2, "Prepare motifs files," it is essential to have complete genome sequences obtained from the same strain (same reference genome). This means that the genome sequences used for this step should be derived from the exact same bacterial strains. This ensures accuracy and consistency in the analysis of motifs.
TF.meme: TFs (Transcription Factors) binding prediction is an optional function module of Bacmethy. To utilize this module, the TF matrix files in PPM format are needed (with a .meme suffix).
NOTE:
Typically, you can obtain the genome.fa, motifs.gff, and motifs.csv files from a public database or through the SMRT-seq facility. The Pacbio SMRTseq facility is equipped with the pre-installed SMRTLink software.
Nevertheless, if you need to carry out SMRT-LINK analysis on your own, you may follow the instructions on the website.
PacBio provide HGAP4(Hierarchical Genome Assembly Process) to generate high quality de novo assemblies of genomes, using Continuous Long Reads.
Use Base Modification Analysis Application in SMRT-LINK or SMRT Tools to identify bacterial base modifications, and analyze the methyltransferase recognition motifs. Detection can be down using an in-silico control consisting of expected kinetic signals.
When using the "Base Modification Analysis Application", it's suggested to add the fraction and motif reauirements (-t kineticstools_compute_methyl_fraction=true -t run_find_motifs=true) to obtain methylation motif and methylation fraction information accurately.
bash Bacmethy.sh -m motifs.gff -s motifs.csv -g genome.fasta -p PREFIX -t m6A
bash Bacmethy.sh -m <motifs.gff> -s <motifs.csv> -g <genome.fa> -p <prefix> -t <m6A, m4C or m5C> [options] -d -b -a -r -T -n -G
requirement: locally installed PROKKA, bedtools and MEME softwares
required
-m FILE a motif detected file in gff format from PacBio portal (required)
-s FILE a motifs summary file (required)
-g FILE a genome file which complete sequenced (required)
-p STRING prefix of output (required; usually a strains name or a sample name)
-t STRING a type of methylation type (required; one of m6A, m4C or m5C)
options
-T FOLDER a floder contains TFs files in meme format (required; users can get from Calcute_PPM_console_final.py)
-d FLOAT a undermethylation thresholds of fraction (default: 0.75)
-i FLOAT a unmethylation thresholds of identificationQv (default: 40)
-c FLOAT a unmethylation thresholds of coverage (default: 30)
-b INT number of bps before TSS (default: 500)
-a INT number of bps after TSS (default: 100)
-r FILE FASTA or GBK file to use as 1st priority (default '')
-n INT Number of CPUs to use (default '8')
-G NULL Scan the DNA methylation sites on gene Coding region (default only scan Regulation Region)
The genome file used for Bacmethy analysis should be the same genome reference file which used in SMRTLink methylation motif detection analysis.
Additionally, there are two recommended options for better genome annotation:
The -r option allows users to add a reference file from a standard strain in the same species. This helps improve genome annotation.
The -d option is used for specifying the methylation level direction. There are complex and dynamic epigenetic regulations involved in DNA methylation. To ensure the authenticity of methylation events, a sequencing coverage higher than 30 is required.
The software classifies DNA methylation events into three levels: methylation, undermethylation, and unmethylation.
Bacmethy.sh -m motifs.gff -s motif_summary.csv -g genomic.fna -p K12 -t m6A -T /Your/Path/To/TF/meme
Bacmethy uses parallel processing to decrease running time on multicore computers. Users can set Running CPU by parameter -n.
structure of output files
There are 3 gene features, promoter(default: 500bp upstream region before the ATG initiation codon), CDS_RR(default: 100bp downstream region after the initiation codon), Coding Region(Whole Gene coding region). And we defined both promoter and CDS as Regulation Region(RR) in Bacmethy.
Gene transcription initiation is a complex process regulated by multiple factors. The start of the gene body can also play a role in gene transcription initiation regulation. Hence, the term CDS_RR is used to describe methylation sites that occur at the start of the coding sequence (CDS) region, which is involved in the regulation of gene transcription.
Some users may be interested specifically in methylation sites that occur within the gene body. For this purpose, Bacmethy provides a separate folder exclusively for these methylation sites, which are classified under the category of Coding Region.
motif_methylationType.motif.methylation_result.txt
column name | Description |
---|---|
Strains | Strain name |
Methylation | Methylation type |
nCDS | counts of methylated bases in CDS region |
nPROMOTER | counts of methylated bases in promoter |
nRR | counts of methylated bases in Regulation Region |
motif_methylationType.methylation_methylationGene.txt
column name | Description |
---|---|
Strains | Strain name |
Methylation | Methylation type |
Region | the region which the methylation site located, CDS or promoter |
RRS | Regulation region start site |
RRE | Regulation region end site |
Methsite | the methylation site |
Fraction | Estimate of the fraction of molecules that carry the modification |
distance | the distance between the start site of gene orf and methylated site |
start | gene start site |
end | gene end site |
strand | the gene transcription direction, +/- |
locus tag | the gene position ID |
gene name | gene name annotated by prokka |
description | gene function annotation |
motif_methylationType.methylationLevel_TF.meme.txt
column name | Description |
---|---|
Strains | Strain name |
Methylation | Methylation type |
TF binding start | TF binding position start site |
TF binding end | TF binding position end site |
FIMO score | the TF which binding with RR or CDS region in FIMO scan score |
Region | the region which the methylation site located, CDS or promoter |
RRS | Regulation region start site |
RRE | Regulation region end site |
Methsite | the methylation site |
Fraction | Estimate of the fraction of molecules that carry the modification |
distance | the distance between the start site of gene orf and methylated site |
start | gene start site |
end | gene end site |
strand | the gene transcription direction, +/- |
locus tag | the gene position ID |
gene name | gene name annotated by prokka |
description | gene function annotation |
motif_methylationType.methylation
e.g. GATC_m6A_motif.methylation
python ./script/Calcute_PPM_console_final.py TF.mat
bash /PATH/TO/Circos_data_prepare.sh PREFIX /PATH/TO/SCRIPT/FOLDER/
Copyright © [2024] [Ji-Hong Liu]. All rights reserved.