SamahElmahdi / Glaucoma-variant-calling

0 stars 4 forks source link

trying to split vcf file into samples and control #25

Open SamahElmahdi opened 4 years ago

SamahElmahdi commented 4 years ago

i have tried to split the compined vcf into samples and control to compare the count of known and novel varients in each of them.

bcftools view -s SRR5858157_F8_III.4,SRR5858162_F8_III.3,SRR5858204_F6_II.2 raw_variants_ann.vcf > raw_variants_ann_samples.vcf bcftools view -s SRR5858160_F7_II.3,SRR5858161_F7_II.2 raw_variants_ann.vcf > raw_variants_ann_controls.vcf grep -v "^#" raw_variants_ann_controls.vcf | awk '{print $3}' | wc -l ###total35472 grep -v "^#" raw_variants_ann_controls.vcf | awk '{print $3}' | grep "^rs" | wc -l ###known10900 grep -v "^#" raw_variants_ann_samples.vcf | awk '{print $3}' | wc -l ###total35472 grep -v "^#" raw_variants_ann_samples.vcf | awk '{print $3}' | grep "^rs" | wc -l ###Known10900 the nos are equal and i think that there is something wrong in my code but i really don't have time to try to solve it as today is the dead line and we can calculate them from the state file created before. these links was helpful

  1. https://bioinformatics.stackexchange.com/questions/3477/how-to-subset-samples-from-a-vcf-file
  2. https://toolshed.g2.bx.psu.edu/repository/display_tool?repository_id=f667c2ee6f2ca971&tool_config=%2Fsrv%2Ftoolshed%2Fmain%2Fvar%2Fdata%2Frepos%2F002%2Frepo_2516%2Fbcftools_view.xml&changeset_revision=cc016cb332cd
SamahElmahdi commented 4 years ago

another method : cat raw_variants_ann.vcf |cut -f3| grep "^rs" > raw_variants_ann_knownIDs.txt vcftools --vcf raw_variants_ann_samples.vcf --snps raw_variants_ann_knownIDs.txt --recode --recode-INFO-all --stdout > raw_variants_ann_samples_known.vcf vcftools --vcf raw_variants_ann_controls.vcf --snps raw_variants_ann_knownIDs.txt --recode --recode-INFO-all --stdout > raw_variants_ann_controls_known.vcf wc -l raw_variants_ann_samples_known.vcf ####10944 wc -l raw_variants_ann_controls_known.vcf ####10944 same no of each one!! may be the problem is in the splitting file step !!!