RitataLU / MethylC-analyzer

GNU General Public License v3.0
11 stars 4 forks source link
comparison dmr methylc-analyzer

MethylC-analyzer

MethylC-analyzer is a analyzer developing for analysing DNA methylation on WGBS and RRBS, it could utilize not only individual sample also do comparison between two groups.

Publication

Rita Jui-Hsien Lu☨, Pei-Yu Lin☨, Ming-Ren Yen☨, Bing-Heng Wu, Pao-Yang Chen (2023) MethylC-analyzer: a comprehensive downstream pipeline for the analysis of genome-wide DNA methylation Botanical Studies, https://doi.org/10.1186/s40529-022-00366-5

MethylC-analyzer will produce 7 analysis and each analysis contains CG, CHG and CHH 3 context:

MethylC-analyzer Flow

System Requirements

Installation

From Docker

:pushpin: Recommend using docker image to avoid enveioment conflict

docker pull peiyulin/methylc

From Github

  1. Obtain Python 3.9
  2. Recommend to create a conda environment somewhere on your disk, and then activate it.

    $ conda create -n methylC_analyzer_env python=3.9
    $ conda activate methylC_analyzer_env
    
  3. Download the source code and install the requirements.

    $ git clone https://github.com/RitataLU/MethylC-analyzer.git
    
  4. Install Package - Run MethylC-analyzer/requirements/base.txt

    ex: sh MethylC-analyzer/requirements/base.txt

Input files

  1. CGmap.gz file (need gzip compressed format) is the output of BS-Seeker2.(post-alignment data by utilizing Bs-seeker2 and Bs-seeker3)

    CGmap format

    chr1    G       13538   CG      CG      0.6     6       10
    chr1    G       13539   CHG     CC      0.0     0       9
    chr1    G       13541   CHH     CA      0.0     0       9
    chr1    G       13545   CHH     CA      0.0     0       8

The methylation calling files from other aligners/callers, MethylC-analyzer provides a python script (methcalls2CGmap.py) to convert them to CGmap.gz, including CX report files generated by Bismark, the methylation calls generated by methratio.py in BSMAP (v2.73), and the TSV files exported from the methylation calling status with METHimpute.

usage: methcalls2CGmap.py [-h] [-n FILENAME]
                         [-f {bismark,bsmap,methimpute}]

optional arguments:
 -h, --help            show this help message and exit

Input format:
 -n FILENAME, --filename FILENAME
                       the file name that the users want to convert to CGMap
                       format
 -f {bismark,bsmap,methimpute}, --format {bismark,bsmap,methimpute}
                       the type of file to CGmap

Example for converting methylation calls to CGmap.gz:

# bismark to CGmap.gz
python methcalls2CGmap.py -n CX_report.txt.gz -f bismark

2.Gene annotation (GTF)

gene annotation in GTF file: User can downloaded from ensemble FTP

Tutorial

Please follow the tutorial of example use case

MethylC-analyzer docker tutorial :mega:Recommend

MethylC-analyzer command line tutorial

Run MethylC-analyzer

Make a sample description file and name it as "samples_list.txt" in the location where methylc.py script. The file should be tab-delimited without a header.

Sample Description File (tab-delimited, no header in the first line) Sample list ( sample_name CGmap_location group )


wt1 wt1.CGmap.gz    WT
wt2 wt2.CGmap.gz    WT
wt3 wt3.CGmap.gz    WT
met1_1  met1_1.CGmap.gz met1
met1_2  met1_2.CGmap.gz met1
met1_3  met1_3.CGmap.gz met1

Usage:

$ python MethylC.py samples_list.txt TAR10.genes.gtf 

usage: MethylC_new.py [-h] [-a GROUP1] [-b GROUP2] [-d DEPTH] [-r REGION]
                      [-q QUALIFIED] [-context CONTEXT] [-hc HEATMAP_CUTOFF]
                      [-dmrc DMR_CUTOFF] [-test TESTMETHOD] [-pvalue PVALUE]
                      [-bs BIN_SIZE] [-p PROMOTER_SIZE]
                      samples_list input_gtf_file

positional arguments:
  samples_list        samples CGmap description
  input_gtf_file      path of gene annotation

  ## Arguments
  -h, --help          show this help message and exit
  -a GROUP1           Name of group1
  -b GROUP2           Name of group2
  -d DEPTH            Minimum depth of sites. Default=4
  -r REGION           Size of region. Default=500
  -q QUALIFIED        Minimum sites within a region. Default=4
  -context CONTEXT    Context used. Default=CG
  -hc HEATMAP_CUTOFF  Methylation cutoff of PCA & Heatmap. Default = 0.2
  -dmrc DMR_CUTOFF    Methylation cutoff of DMR. Default = 0.1
  -test TESTMETHOD    DMR testing method. 0:TTest, 1:KS, 2:MWU. Default=0
  -pvalue PVALUE      p-value cutoff for identifying DMR. Default = 0.05
  -bs BIN_SIZE        Bin size of chrView and Metaplot. Default = 1000000
  -p PROMOTER_SIZE    promoter_size

 ## activate interface (Users select analysis that want to process)

Heatmap & PCA Analysis?  (y/n): y
Identify DMR?  (y/n): y
Identify DMG?  (y/n): y
Use Fold Enrichment Analysis?  (y/n): y
Chromosome View Analysis?  (y/n): y
Metaplot Analysis?  (y/n): y
enter experimental group name analysis: met1
enter control group name analysis: WT

Output Figures

  1. The average methylation in 3 context (CG, CHG, CHH)

  2. PCA & Heatmap show variable region among samples

PCA:

Heatmap:

  1. The distribution fo DNA methylation on each chromosome
  1. The distribution fo DNA methylation difference on each chromosome

  2. Summary of dentifying Differentially Methylated Regions (DMRs) & Differentially Methylated Genes (DMGs)

  1. Genomic regions fold enrichment analysis for DMRs
  1. The distribution of DNA methylation around gene body
  1. The distribution of DNA methylation difference around gene body