MorrellLAB / sequence_handling

A series of scripts to automate sequence workflows
19 stars 8 forks source link

A script for breaking up WGS data into smaller intervals at N strings #46

Closed EDitt closed 4 years ago

EDitt commented 4 years ago

Uses GATK, Picard, and BEDTools to: 1.) Make a FASTA subset for regions of interest with corresponding .dict and .fai files 2.) Identify positions of N-strings above a user-specified threshold (returns coordinates for ACGTmers and Nmers) 3.) Convert .interval_list file to .bed file 4.) Merge regions into windows of an approximate (user-specified) size 5.) Format for use in pipeline as a .bed file