This program takes an input BAM file and masks base positions that are potentially influenced by bisulfite-seq as observed from alignments between query and reference sequences. The script requires either an input FASTA sequence of the reference genome (slow) or a BAM file with MD tags (fast) as generated by 'samtools calmd'.
The script requires Pysam which can be installed via Bioconda:
conda install -c bioconda pysam
The script can run either on an input BAM file with proper MD tags such as those generated by 'samtools calmd', or by taking the reference genome fasta directly. Note that MD tags generated directly via bisulfite-aware alignment software may be inappropriate, as they sometimes follow a non-standard definition.
# using proper MD tags
samtools calmd -b input.bam genome.fa 1> calmd.bam 2> /dev/null
./revelio.py calmd.bam masked.bam
# using reference genome directly
./revelio.py -f genome.fa input.bam masked.bam