PengNi / ccsmethphase

Methylatino phasing using PacBio CCS reads
MIT License
9 stars 0 forks source link

ccsmethphase

Methylation phasing using PacBio CCS reads

ccsmethphase-tubemap

Contents

Installation

Recommended Hardware requirements: 128 GB RAM, 40 CPU processors, 4 TB disk storage, >=8 GB GPU

Recommended OS: Linux (Ubuntu 16.04, CentOS 7, etc.)

# create a new environment and install nextflow in it
conda create -n nextflow -c conda-forge -c bioconda nextflow

# OR, install nextflow in an existing environment
conda install -c conda-forge -c bioconda nextflow
git clone https://github.com/PengNi/ccsmethphase.git
# e.g., install singularity using conda
conda install -c conda-forge singularity 
conda install -c conda-forge graphviz

Demo data

Check ccsmethphase/demo for demo data:

Input

ccsmethphase takes files of PacBio reads (subreads.bam or hifi.bam), and a reference genome as input.

The information of PacBio reads files should be organized into a tsv file, like input_sheet.tsv in demo data:

Group_ID Sample_ID Type Path
G1 HG002_demo hifi ./demo/hg002.chr20_demo.hifi.bam
G1 HG002_demo hifi _path of another flowcell bam file for HG002demo
G1 HG003 hifi path of a bam file for HG003

NOTE: ccsmethphase can be run with conda, docker, and singularity by setting -profile. If you are using -profile conda to run this workflow, ccsmeth models should be set as input too. Check ccsmeth to get ccsmeth 5mCpG models.

Usage

Parameters (to be completed):

--dsname:   job name
--input:    the tsv file containing information of PacBio reads files
--genome:   file path of the reference genome
--include_all_ctgs: "true" or "false". default false, means only [chr][1-22,X,Y] included.
-profile:   conda/docker/singularity, test

Example 1. Run with singularity (recommended)

If it is the first time you run with singularity (e.g. using -profile singularity), the following cmd will cache the dafault singularity image (--singularity_name and/or --clair3_singularity_name) to the --singularity_cache directory (default: local_singularity_cache) first. There will be .img file(s) in the --singularity_cache directory.

NOTE: If you are using relative paths of bam files in _inputsheet.tsv, make sure the relative paths are the right relative paths to the directory you launch the workflow.

For the example data:

# activate nextflow environment
conda activate nextflow

cd /path/to/ccsmethphase
nextflow run main.nf \
    --dsname test \
    -profile singularity,test

The above command is equal to the command following:

nextflow run /path/to/ccsmethphase \
    --dsname test \
    --genome /path/to/ccsmethphase/demo/chr20_demo.fa \
    --input /path/to/ccsmethphase/demo/input_sheet.tsv \
    --include_all_ctgs true \
    --max_cpus 8 \
    --max_memory "12.GB" \
    --max_time "6.h" \
    -profile singularity

Try the following command to enable GPU:

CUDA_VISIBLE_DEVICES=0 nextflow run /path/to/ccsmethphase \
    --dsname test \
    --genome /path/to/ccsmethphase/demo/chr20_demo.fa \
    --input /path/to/ccsmethphase/demo/input_sheet.tsv \
    --include_all_ctgs true \
    -profile singularity

The downloaded singularity .img file(s) can be re-used, without being downloaded again:

nextflow run /path/to/ccsmethphase \
    --dsname test2 \
    --genome /path/to/some/other/genome/fa \
    --input /path/to/some/other/input_sheet.tsv \
    -profile singularity \
    --singularity_cache /path/to/local_singularity_cache

Try -resume to re-run a modified/failed job to save time:

nextflow run /path/to/ccsmethphase \
    --dsname test \
    --genome /path/to/ccsmethphase/demo/chr20_demo.fa \
    --input /path/to/ccsmethphase/demo/input_sheet.tsv \
    --include_all_ctgs true \
    -profile singularity \
    -resume

Possible issues:

  1. no space left on device ERROR

This may be caused by limited space of the SINGULARITY_TMPDIR (default /tmp) dir. Try setting SINGULARITY_TMPDIR to a disk that have enough space in current environment:

export SINGULARITY_TMPDIR=/path/to/another/dir

Output

The output directory should look like the following:

ccsmethphase_results/
├── pipeline_info
└── test
    ├── bam
    │   ├── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.SNV_PASS_whatshap.bam
    │   └── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.SNV_PASS_whatshap.bam.bai
    ├── diff_methyl
    │   ├── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.SNV_PASS_whatshap.freq.aggregate.hp_callDML.txt
    │   ├── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.SNV_PASS_whatshap.freq.aggregate.hp_callDMR.autosomes_cf0.2.bed
    │   └── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.SNV_PASS_whatshap.freq.aggregate.hp_callDMR.txt
    ├── mods_freq
    │   ├── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.SNV_PASS_whatshap.freq.aggregate.all.bed
    │   ├── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.SNV_PASS_whatshap.freq.aggregate.hp1.bed
    │   └── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.SNV_PASS_whatshap.freq.aggregate.hp2.bed
    └── vcf
        ├── clair3_called
        │   ├── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.clair3_merge.vcf.gz
        │   └── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.clair3_merge.vcf.gz.tbi
        └── whatshap_phased
            ├── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.clair3_merge.SNV_PASS_whatshap.vcf.gz
            └── G1.HG002_demo.hifi.ccsmeth.modbam.pbmm2.merged_size1.clair3_merge.SNV_PASS_whatshap.vcf.gz.tbi

Acknowledgements

TODO