Closed durwa004 closed 4 years ago
Get_annovar_snpeff_coding_tidy.py
Old scripts
Get chrom/pos/impact of all coding variants
$ python Get_chrom_pos_impact.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_intersect/ -p SnpEff
$ python Get_chrom_pos_impact.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_intersect/ -p annovar
Output chrom/pos of union and intersect of the annotation programs
$ python Get_union_intersect_VEPs.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_intersect/
Next - no. of variants in intersect per individual and per breed group
$ qsub /home/mccuem/shared/Projects/HorseGenomeProject/scripts/EquCab3/s3_scripts/s3cmd_put_annovar.pbs
Extract breeds from thesis intersect vcf
$ python Generate_bcftools_view_by_breed.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -i /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/horse_genomes_breeds_tidy.txt -b QH
$ python ../../variant_calling/python_generation_scripts/Generate_pbs_submission_shell.py -d ../split_by_breed/
$ sh /home/mccuem/shared/Projects/HorseGenomeProject/scripts/EquCab3/genetic_burden_pipeline/split_by_breed/pbs_shell.sh
Arabian (35) Belgian (20) Clydesdale(19) Morgan (20) STB (48) TB (53) WP (20) Icelandic (17) Shetland (55) QH (70)
Move joint intersect chromosome files to s3
$ qsub /home/mccuem/shared/Projects/HorseGenomeProject/scripts/EquCab3/s3_scripts/s3cmd_put_joint_intersect.pbs
$ python Generate_bcftools_view_by_breed.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -i /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/horse_genomes_breeds_tidy.txt -b Shetland -m 3 -c 11
$ python Generate_bcftools_view_by_breed.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -i /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/horse_genomes_breeds_tidy.txt -b TB -m 3 -c 10
$ python Generate_bcftools_view_by_breed.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -i /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/horse_genomes_breeds_tidy.txt -b STB -m 3 -c 10
$ python Generate_bcftools_view_by_breed.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -i /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/horse_genomes_breeds_tidy.txt -b Icelandic -m 1 -c 3
$ python Generate_bcftools_view_by_breed.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -i /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/horse_genomes_breeds_tidy.txt -b WP -m 1 -c 4
$ python Generate_bcftools_view_by_breed.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -i /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/horse_genomes_breeds_tidy.txt -b Morgan -m 1 -c 4
$ python Generate_bcftools_view_by_breed.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -i /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/horse_genomes_breeds_tidy.txt -b Belgian -m 1 -c 4
$ python Generate_bcftools_view_by_breed.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -i /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/horse_genomes_breeds_tidy.txt -b Clydesdale -m 1 -c 4
$ python Generate_bcftools_view_by_breed.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -i /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/horse_genomes_breeds_tidy.txt -b QH -m 4 -c 14
$ python Generate_bcftools_view_by_breed.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -i /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/horse_genomes_breeds_tidy.txt -b Arabian -m 2 -c 7
This creates quite a lot of files with a lot of memory. These are the file output options:
The command line was: bcftools isec -p Arabian_rare_Belgian_common thesis_intersect_Arabian_rare.vcf.gz thesis_intersect_Belgian_common.vcf.gz
Using the following file names:
Arabian_rare_Belgian_common/0000.vcf for records private to thesis_intersect_Arabian_rare.vcf.gz
Arabian_rare_Belgian_common/0001.vcf for records private to thesis_intersect_Belgian_common.vcf.gz
Arabian_rare_Belgian_common/0002.vcf for records from thesis_intersect_Arabian_rare.vcf.gz shared by both thesis_intersect_Arabian_rare.vcf.gz thesis_intersect_Belgian_common.vcf.gz
Arabian_rare_Belgian_common/0003.vcf for records from thesis_intersect_Belgian_common.vcf.gz shared by both thesis_intersect_Arabian_rare.vcf.gz thesis_intersect_Belgian_common.vcf.gz
Looks like I can delete 0000, and 0001
$ Get_rare_breed_common_breed_info.py
Can probably also find variants that have an AF <0.5% in the whole population but over 5% in a population. Can probably also find variants that have an AF <3% in the breed but over 5% in the general population. Arabian (35): AF < 3% = AC: 2, AF > 5% = 4 Belgian (20): AF < 3% = AC: 1, AF > 5% = 2 Clydesdale(19): AF < 3% = AC: 1, AF > 5% = 2 Morgan (20): AF < 3% = AC: 1, AF > 5% = 2 STB (48): AF < 3% = AC: 3, AF > 5% = 5 TB (53): AF < 3% = AC: 3, AF > 5% = 5 WP (20): AF < 3% = AC: 1, AF > 5% = 2 Icelandic (17): AF < 3% = AC: 1, AF > 5% = 2 Shetland (55): AF < 3% = AC: 3, AF > 5% = 6 QH (70): AF < 3% = AC: 4, AF > 5% = 7
$ python ../../python_scripts/Generate_bcftools_view_rare_variants_breed_common_variants_pop.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -b Arabian -m3 2 -m5 4
$ python ../../python_scripts/Generate_bcftools_view_rare_variants_breed_common_variants_pop.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -b Belgian/Clydesdale/Morgan/WP/Icelandic -m3 1 -m5 2
$ python ../../python_scripts/Generate_bcftools_view_rare_variants_breed_common_variants_pop.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -b STB/TN -m3 3 -m5 5
$ python ../../python_scripts/Generate_bcftools_view_rare_variants_breed_common_variants_pop.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -b QH -m3 4 -m5 7
$ python ../../python_scripts/Generate_bcftools_view_rare_variants_breed_common_variants_pop.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -b Shetland -m3 3 -m5 6
Analyze this data.
Had to redo this because it wouldn't pull out the common/rare population variants so will have to use gatk for that.
$ python ../../python_scripts/Generate_bcftools_view_rare_variants_breed_common_variants_pop.py -d /home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/ -b TB -m5 5
$ Get_rare_breed_common_pop_info.py
Transfer to my laptop
$ scp durwa004@login.msi.umn.edu:/home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/common_breed_rare_pop.txt /Users/durwa004/Documents/PhD/Projects/1000_genomes/GB_project/breed_pop_variants/
$ scp durwa004@login.msi.umn.edu:/home/mccuem/shared/Projects/HorseGenomeProject/Data/ibio_EquCab3/ibio_output_files/joint_gvcf/joint_intersect/rare_breed_common_pop.txt /Users/durwa004/Documents/PhD/Projects/1000_genomes/GB_project/breed_pop_variants/
Need to redo all of this! Without the PRZE horses!!!!!!!
ERR1527968 - horse with no variants = removed
sh breed_common_rare_variants_tidy.sh Then: Look at these variants
annovar/snpeff Then: $ Get_type_of_variant_SnpEff.py $ Get_type_of_variant_annovar.py Then: Pull out dz variants
Figure out how we add in DOC to analysis (emailed AR)
Pull out type of variant - all
Details about impact annovar: http://annovar.openbioinformatics.org/en/latest/user-guide/gene/ SnpEff: http://snpeff.sourceforge.net/SnpEff_manual.html
To do for paper: