Closed annaquaglieri16 closed 5 years ago
Hi again,
I guess this is the same problem as for the AD
above which is parsed correctly.
head(VariantAnnotation::geno(vcf)$AD[,1])
$`chr1:36933772_C/T`
[1] 523 9
$`chr1:36935370_T/C`
[1] 482 7
$`chr1:36937059_A/G`
[1] 249 239
$`chr1:36939108_T/C`
[1] 377 6
$`chr1:36941395_G/C`
[1] 223 3
Cheers, Anna
Hi @annaquaglieri16 , Thanks for the report. It will be awhile before I have a chance to look at this. Patches always welcome. Valerie
I think the issue is that the QSS field is noted as Number=A
in the header but it should be Number=R
if the number of values is equal to the total number of alleles, not the number of alt alleles.
@annaquaglieri16 have you looked into Michael's suggestion? Do you know why the QSS header is specified as Number=A
instead of Number=R
? This would indicate a problem with how the output was created not in how the data are read in. See section 1.2.2: https://samtools.github.io/hts-specs/VCFv4.2.pdf
Hi @vobencha ! Sorry I missed @lawremi 's other reply. I see, indeed in the VCF files produced by mutect, the AD
field is read ok as it is specified as Number=R
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
but the QSS
is not.
##FORMAT=<ID=QSS,Number=A,Type=Integer,Description="Sum of base quality scores for each allele">
Which I interpret it as ref
and alt
allele. I will try to mention this to the Mutect2
developers.
Thanks,
Anna
Well, actually, I figured they won't do anything on Mutect2 GATK3
but they will ask me to try the latest version GATK4 Mutect2
where all fields are different and qualities are specified differently.
So, I will keep my patch for GATK3
to read the two values correctly and add the option for GATK4 Mutect2
in my package.
Thanks for your help!
Anna
Sounds good.
Hi there,
I really love this package and I wrote one function to parse several type of VCF output (from several caller) into a standardised format with standardised columns (https://annaquaglieri16.github.io/varikondo/articles/vignette.html)
However, there is a problem when parsing VCF fields that contain more than one entry separated by a comma. For example, the
QSS
field inMuTect2
containsbase_quality_ref,base_quality_alt'.
VariantAnnotation` only reads in the first column. Below is a reproducible example.VariantAnnotation
read.table
Which shows how the
QSS
field (second to last) is reported :/. I also try theSeqArray
package and it also throw an error trying to read this field :/At the moment I will tray to fix it by using
read.table
but is there a chance that this can be updated?I actually also noted a similar problem when trying to read in
Freebayes
output.Freebayes
reports several alternative alleles (if they are present) separated by commas. Only the first one is reported withVariantAnnotation
.Thanks!!
Anna