Closed jgrady-omico closed 5 years ago
I'll have to delay looking at this for a couple of weeks, but it's on my radar. Thanks for reporting.
No worries, thanks Brent.
hi, this is working as expect. flag
allows you to get the presence of the variable in another file. the field that you pull is just a place-holder.
I've encountered an issue annotating a vcf using the Cosmic coding mutations VCF - I can't get the 'SNP' flag to annotate correctly. It's the only flag I've tried to annotate but I can't see to get it to work as I would expect.
Here is an extract of the cosmic file, with the header and a specific variant in KRAS - there are two entries for it. Neither of them have the SNP flag set.
fileformat=VCFv4.1
source=COSMICv84
reference=GRCh37
fileDate=20180213
comment="Missing nucleotide details indicate ambiguity during curation process"
comment="URL stub for COSM ID field (use numeric portion of ID)='http://grch37-cancer.sanger.ac.uk/cosmic/mutation/overview?id='"
comment="REF and ALT sequences are both forward strand
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
12 25398284 COSM1135366 C T . . GENE=KRAS_ENST00000256078;STRAND=-;CDS=c.35G>A;AA=p.G12D;CNT=1091 12 25398284 COSM521 C T . . GENE=KRAS;STRAND=-;CDS=c.35G>A;AA=p.G12D;CNT=14473
If I used the following conf file:
[[annotation]] file="CosmicCodingMuts.vcf.gz" fields = ["ID", "CNT", "SNP"] ops=["concat", "max", "flag"] names=["cosm", "cosm_cnt", "cosm_snp"]
The annotation for this variant is: cosm=COSM1135366,COSM521;cosm_cnt=14473;cosm_snp
So, although the flag is not set in the annotation file, it gets applied to the mutations. In fact, it gets applied to every line from the annotation vcf, ignoring whether the flag is actually set on the line or not.
I tried this as well (as I wasn't sure what would happen if there were two conflicting lines for the same variant, one with the flag set and one without):
[[annotation]] file="CosmicCodingMuts.vcf.gz" fields = ["ID", "CNT", "SNP"] ops=["concat", "max", "count"] names=["cosm", "cosm_cnt", "cosm_snp"]
This results in the following: cosm=COSM1135366,COSM521;cosm_cnt=14473;cosm_snp=2
Here, I would expect the answer to be cosm_snp=0, or blank. For every mutation, the cosm_snp is set to the number of lines in the annotation file for that variant, irrespective of whether the flag is set on those lines.
I've also tried uniq and self, none of which return different results for flag set vs not flag set annotation lines.
It feels like a bug... but I may be doing something wrong, hopefully you can help!
Also... if I do apply 'count' to a flag (which feels like a meaningful thing to do - I'm not sure how 'flag' would work for multiple lines) - the vcf type is Number=0,Type=Float. It feels like this should be altering this to Number=1,Type=Float.
Thanks,
John