Open SamahElmahdi opened 4 years ago
another method : cat raw_variants_ann.vcf |cut -f3| grep "^rs" > raw_variants_ann_knownIDs.txt vcftools --vcf raw_variants_ann_samples.vcf --snps raw_variants_ann_knownIDs.txt --recode --recode-INFO-all --stdout > raw_variants_ann_samples_known.vcf vcftools --vcf raw_variants_ann_controls.vcf --snps raw_variants_ann_knownIDs.txt --recode --recode-INFO-all --stdout > raw_variants_ann_controls_known.vcf wc -l raw_variants_ann_samples_known.vcf ####10944 wc -l raw_variants_ann_controls_known.vcf ####10944 same no of each one!! may be the problem is in the splitting file step !!!
i have tried to split the compined vcf into samples and control to compare the count of known and novel varients in each of them.
bcftools view -s SRR5858157_F8_III.4,SRR5858162_F8_III.3,SRR5858204_F6_II.2 raw_variants_ann.vcf > raw_variants_ann_samples.vcf bcftools view -s SRR5858160_F7_II.3,SRR5858161_F7_II.2 raw_variants_ann.vcf > raw_variants_ann_controls.vcf grep -v "^#" raw_variants_ann_controls.vcf | awk '{print $3}' | wc -l ###total35472 grep -v "^#" raw_variants_ann_controls.vcf | awk '{print $3}' | grep "^rs" | wc -l ###known10900 grep -v "^#" raw_variants_ann_samples.vcf | awk '{print $3}' | wc -l ###total35472 grep -v "^#" raw_variants_ann_samples.vcf | awk '{print $3}' | grep "^rs" | wc -l ###Known10900 the nos are equal and i think that there is something wrong in my code but i really don't have time to try to solve it as today is the dead line and we can calculate them from the state file created before. these links was helpful