Closed ekarlins closed 7 years ago
The GC model file is specific to genome build. This requires downloading a file from the UCSC browser. Using this file is optional for PennCNV. Our pipeline should allow PennCNV to be run using a GC model file, but we will rely on the user to generate this file.
See details below about how to generate this file. We may want to point user to this documentation.
cal_gc_snp.pl -h
Usage:
cal_gc_snp.pl [arguments]
Optional arguments:
-v, --verbose use verbose output
-h, --help print help message
-m, --man print complete documentation
--numwindow <int> number of sliding window (default=100, or 500kb on each side)
--backgroundgc <float> backgroud GC frequency (default=0.42)
--output <file> write output to this file
Function: calculate GC content surrounding each marker within specified sliding
window, using the UCSC GC annotation file (for example,
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/gc5Base.txt.gz for
human NCBI36 genome assembly) that is also sorted
Example: cal_gc_snp.pl gc5Base.txt.sorted signalfile -output file.gcmodel
Options: --help print a brief help message and exit
--man print the complete manual of how to use the program
--verbose
use verbose output
--numwindow
the number of non-overlapping sliding window on each side of the
SNP.
--backgroundgc
background GC level (for genomic regions without base
information). By default it is 0.42 for human genome.
--output
specify the output file name
Write code using the script "cal_gc_snp.pl", that comes with PennCNV, to generate a PFB file.
PennCNV is installed on our NCI cluster (CCAD/cgemsiii), so it's probably easiest to just run these tests there. On the cluster this is how you can see the help page for this script:
module load PennCNV/2015-v1.0.3 cal_gc_snp.pl -h
Please put working code for generating a GC model file in a .sh file in the "scripts" directory in this repo. i.e. test the code by submitting the bash script to the cluster using qsub. Point us to the .sh file and close this ticket once you are confident that this code works.
This file may be specific to the genome build, so we may omit this or just mention it as an option if it's too specific for our pipeline.