brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
249 stars 23 forks source link

Period (.) in INFO fields throws an error #21

Closed nroak closed 5 years ago

nroak commented 5 years ago

I'm trying to apply filters using --info field for my ANNOVAR annotated VCF. I'm able to run this expression for gnomAD_genome_ALL filtering:

 ~/resources/slivar expr \
 --pass-only \
 --vcf $family.annovar.hg38_multianno.vcf.gz \
 --out-vcf $family.slivar.vcf \
 --js $CURRENTDIR/functions.js \
 --info "variant.call_rate > 0.9 && INFO.gnomAD_genome_ALL < 0.05" \
 --alias $family.alias \
 --group-expr "shared_affected:finder(affecteds, unaffecteds)"

But when I do this expression where the info fields have periods/full-stops such as ExonicFunc.refGene, it throws an error:

~/resources/slivar expr \
--pass-only \
--vcf $family.annovar.hg38_multianno.vcf.gz \
--out-vcf $family.slivar.noMAF.LOF.vcf \
--js $CURRENTDIR/functions.js \
--info 'variant.call_rate > 0.9 && (INFO.ExonicFunc.refGene=="stopgain" || INFO.ExonicFunc.refGene=="frameshift_insertion" || INFO.ExonicFunc.refGene=="frameshift_deletion")' \
--alias $family.alias \
--group-expr "shared_affected:finder(affecteds, unaffecteds)"
slivar version: 0.0.8

7 samples matched in VCF and PED to be evaluated
slivarpkg/duko.nim(80)   check
Error: unhandled exception: error from duktape: unknown attribute:ExonicFunc for expression:variant.call_rate > 0.9 && (INFO.ExonicFunc.refGene=="stopgain" || INFO.ExonicFunc.refGene=="frameshift_insertion" || INFO.ExonicFunc.refGene=="frameshift_deletion")
 [ValueError]
brentp commented 5 years ago

you can use: INFO["ExonicFunc.refGene"] =="stopgain"

brentp commented 5 years ago

let me know any other problems.

nroak commented 5 years ago

@brentp On the related note, INFO.gnomad_af < 0.001 filter does not output variants where gnomAD AF is missing (.). Is that supposed to happen?

brentp commented 5 years ago

if you want to filter on gnomad, use the gnotate files with the -g argument. in that case, if the allele frequency is missing, slivar will set it to -1. if you are doing that, it should set the gnomad_af to -1. if that's not working, it's a bug, let me know.

if you're relying on an expression that uses a truly missing INFO field, there's not much that slivar can do there to help you. right now, it effectively causes the expression to evaluate to false. The only other option is to set it to true, which I don't think is the right thing. Let me know if I misunderstand.

nroak commented 5 years ago

That makes sense. It wasn't obvious before until I was using that filter. I'm filtering on ANNOVAR annotated INFO fields which produce '.' for fields where value is absent. For gnomAD, I will use the gnotate files. It's very useful to have that compatible gnomAD annotation available with Slivar. Thanks!

brentp commented 5 years ago

you can also use !("gnomad_af" in INFO) || INFO.gnomad_af < 0.01

nroak commented 5 years ago

That's true, I think I will use that until I incorporate gnotate files. Thanks a ton, Brent!

brentp commented 5 years ago

sure. and thank you for the feedback. new release will be out today or tomorrow with a new tsv command to convert the filtered VCFs to spreadsheet. would be good to get your thoughts on that once it's out.

nroak commented 5 years ago

Oh wow! That'll be cool. Currently, I have hardcoded script from GATK VariantsToTable to do that. I would LOVE to test out the Slivar option. To reiterate Aaron's speculation, you do sleep, right?

brentp commented 5 years ago

hah. I try. Here it is: https://github.com/brentp/slivar/releases/tag/v0.1.2