alekseyzimin / WheresWalker

3 stars 0 forks source link

VCF Format column specifications #1

Open michellemeier27 opened 3 weeks ago

michellemeier27 commented 3 weeks ago

Hey there, I'm trying to run your tool but I think it expects the vcf files to have a specific FORMAT column.

grep --color=auto -v '^#' $WT | \ awk '{split($10,a,":");if(a[4]>=int("'$COV'") && a[6]>=int("'$COV'")) print $1" "$2" "$4" "$5" "a[4]" "a[6]}' | \ $MYPATH/compute_snp_freq.pl > $WT_VCF.WT.txt.tmp & Our variant caller gives us this format GT:AD:DP:GQ:PL. Just trying to understand which one of these corresponds to your 4th and 6th entry so I can adjust the tool for our needs.

Hope this makes sense, happy to clarify!

Cheers, Michelle

alekseyzimin commented 2 weeks ago

Hello, the VCF format expected is: GT:DP:AD:RO:QR:AO:QA:GL in field 10 of the VCF file. The numbers that are used are 4th and 6th is ":" separated sequence, or RO and AO.

michellemeier27 commented 2 weeks ago

Thanks for your reply! Just to double check, ff[3] and ff[5] would then be RO and AO, respectively as well? I'm very new to perl...

perl -ane 'BEGIN{ print "Chr\tCoord\tRef\tAlt\tFunc.refGene\tGene.refGene\tGeneDetail.refGene\tExonicFunc.refGene\tAAChange.refGene\tmut.ref\tmut.alt\twt.ref\twt.alt\tratio\n"; open(FILE,"'$WT'"); while($line=<FILE>){ chomp($line); @f=split(/\t/,$line); @ff=split(":",$f[-1]); $ratio=$ff[3]/($ff[5]+0.00000001); $h{"$f[0] $f[1]"}="$ff[3]\t$ff[5]\t$ratio"; } }