akdess / BAFExtract

5 stars 5 forks source link

BAFExtract

BAFExtract generates the B-allele frequency shifts from mapped reads alone without the need for a heterozygous variant call set. BAFExtract a part of CaSpER (https://github.com/akdess/CaSpER)

There are two steps:

  1. Generation of the pileup.
  2. Generation of the B-allele frequency profile.

Please open a new "issue" on github or contact authors Akdes.Harmanci@uth.tmc.edu or arif.o.harmanci@uth.tmc.edu for questions.

Installation

Type make to build BAFExtract. The executable is built under bin directory. The code is tested on various Unix based systems.

Usage

Extract BAF values from RNA-Seq bam files

    samtools view <bam_file> | ./BAFExtract -generate_compressed_pileup_per_SAM stdin <genome_list> <sample_dir> 50 0; ./BAFExtract -get_SNVs_per_pileup  <genome_list> <sample_dir> <genome_fasta_pileup_dir> 20 4 0.1 <output_baf_file>
: the name of sample directory : final output You can download and unzip genome_fasta_pileup_dir files from : [for hg38](https://www.dropbox.com/s/ysrcfcnk7z8gyit/hg38.zip?dl=0) [for hg19](https://www.dropbox.com/s/a3u8f2f8ufm5wdj/hg19.zip?dl=0) Or you can create genome_fasta_pileup_dir files for other genomes using the following commands: ``` BAFExtract -preprocess_FASTA [FASTA file path] [Output directory] ``` for example: ``` wget -c http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz tar -xvzf chromFa.tar.gz mkdir ../mm10 FILES=./*fa for f in $FILES do echo "Processing $f file..." BAFExtract -preprocess_FASTA $f ../mm10 done ``` You can download genome_list files from : [for hg38](https://www.dropbox.com/s/rq7v67tiou1qwwg/hg38.list?dl=0) generated using the following command: ``` fetchChromSizes hg38 > hg38.list ``` [for hg19](https://www.dropbox.com/s/jcmt23nmuzm6poz/hg19.list?dl=0) generated using the following command: ``` fetchChromSizes hg19 > hg19.list ``` # Example [download example bam file](https://www.dropbox.com/s/1vl6iip0b8jwu66/SRR1295366.sorted.bam?dl=0) ```{bash} mkdir test; samtools view SRR1295366.sorted.bam | ./bin/BAFExtract -generate_compressed_pileup_per_SAM stdin hg38.list test 50 0; ./bin/BAFExtract -get_SNVs_per_pileup hg38.list test ./hg38/ 20 4 0.1 test.snp ``` Note: In the example above ./BAFExtract -generate_compressed_pileup_per_SAM uses mimunum mapping quality threshold 50. Depending on the aligner you used the MAPQ-Values can differ a lot and the setting of [Minimum mapping quality] to 50 could mean that no reads are surviving the filtering, only due to the aligners implementation of MAPQ. (acknowledgements to Tobias Tekath)