Open rajwanir opened 5 months ago
Yeah, ALLELE_A and ALLELE_B are integer rather than string variables for two important reasons:
bcftools norm -f REF
as this would cause a mismatch between REF/ALT and ALLELE_A/ALLELE_B if the latter was a stringThanks Giulio. Makes sense.
Do you have any suggestions on any straightforward way of filling in the ALLELE_A/ALLELE_B INFO tags back to possibly REF/ALT for it be compatible with the Picard tools? Essentially I might need to transform the VCF to adpc.bin to do contamination checks with VerifyIDintensity. I am looking at the bcftools +fill tags but seems like it might just fill back the 0 and 1 encoding.
I see. Maybe I will rewrite VerifyIDintensity to work with VCFs as this seems a very simple but valuable piece of software. Where do you get the ABF allele frequencies to run VerifyIDintensity?
That would be an ideal solution. I did forked VerifyIDintensity and see if I could edit it to accept text input instead of the binarized adpc.bin but then paused. :smiley:
bcftools query -f '[%X\t%Y\t%NORMX\t%NORMY\t%GenTrain_Score\t%GT\n]'
Needless to say a direct VCF input would be a cleaner solution.
ABF allele frequencies is typically pulled form 1000genomes project. It's a text file can be prepared separately.
Thanks.
Hi @freeseek ,
What does 0 and 1 represent for ALLELE_A and ALLELE_B in the VCF INFO?
It seems like the Picard tools GtcToVcf implementation write alleles themselves in the Allele A and Alllele B place?https://github.com/search?q=repo%3Abroadinstitute%2Fpicard%20getAlleleA&type=code
I wanted to convert the VCF to adpc.bin throught the Picard tools to run through VerifyIDintensity (https://github.com/gjun/verifyIDintensity) for contamination checks and noted some inconsistency/incompatiblity. Do you have any thoughts?
Thanks.