pyllelic: a tool for detection of allelic-specific methylation variation in bisulfite DNA sequencing files.
Pyllelic documention is available at https://paradoxdruid.github.io/pyllelic/ and see pyllelic_notebook.ipynb
for a fully explored demonstration.
Run an interactive sample pyllelic environment in your web browser using mybinder.org
:
Create a new conda environment using python 3.8:
Easiest:
# Get environment.yml file from this repo
curl -L https://github.com/Paradoxdruid/pyllelic/blob/master/environment.yml?raw=true > env.yml
# Create and activate conda environment
conda env create --file=env.yml
conda activate pyllelic
docker pull ghcr.io/paradoxdruid/pyllelic:latest
Set up files:
from pyllelic import process
from pathlib import Path
# Retrieve promoter genomic sequence of region to analyze
process.retrieve_seq("tert_genome.txt", chrom="chr5", start=1293000, end=1296000)
# Download a reference genome and bisulfite sequencing data
# Genome data from, e.g. http://hgdownload.soe.ucsc.edu/goldenPath/hg19
# Fastq data from, e.g. http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeHaibMethylRrbs/
genome = Path("/{your_directory}/{genome_file_directory}")
fastq = Path("/{your_directory}/{your_fastq_file.fastq.gz}")
# Use bismark tool to prepare bisulfite genome and align fastq to bam file
process.prepare_genome(genome) # can optionally give path to bowtie2 if not in PATH
process.bismark(genome, fastq)
# Sort and index the resultant bam file
bamfile = Path("/{your_directory}/{bam_filename}.bam")
process.sort_bam(bamfile)
process.index_bam(bamfile.parent / f"{bamfile.stem}_sorted.bam")
Run pyllelic:
from pyllelic import pyllelic
config = pyllelic.configure( # Specify file and directory locations
base_path="/home/jovyan/assets/",
prom_file="tert_genome.txt",
prom_start=1293200,
prom_end=1296000,
chrom="5",
offset=1293000, # start position of retrieved promoter sequence
# viz_backend="plotly",
# fname_pattern=r"^[a-zA-Z]+_([a-zA-Z0-9]+)_.+bam$",
# test_dir="test",
# results_dir="results",
)
files_set = pyllelic.make_list_of_bam_files(config) # finds bam files
# Run pyllelic; make take some time depending on number of bam files
data = pyllelic.pyllelic(config=config, files_set=files_set)
positions = data.positions
cell_types = data.cell_types
means_df = data.means # mean methylation of reads
modes_df = data.modes # mode methylation of reads
diff_df = data.diffs # difference mean - mode of reads
individual_data = data.individual_data # read methylation values
data.save("output.xlsx") # save methylation results
data.save_pickle("my_run.pickle") # save data object for later analysis
data.write_means_modes_diffs(filename="Run1_") # write output data files
data.histogram("CELL_LINE", "POSITION") # visualize data for a point
data.heatmap(min_values=1) # methylation level heatmap
data.reads_graph() # individual methylated / unmethylated reads graph
data.quma_results["CELL_LINE"] # see summary data for a cell line
This software is developed as academic software by Dr. Andrew J. Bonham at the Metropolitan State University of Denver. It is licensed under the GPL v3.0.
This software incorporates implementation from QUMA, licensed under the GPL v3.0.