brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
247 stars 23 forks source link

Filter on INFO.AF_popmax, pass if missing value #153

Closed wwgordon closed 1 year ago

wwgordon commented 1 year ago

This follows #74 and #63.

I would like to filter on AF_popmax if it is present, and ignore it if it is missing:

./slivar expr \
    --vcf $vcf --ped $ped \
    --trio "${moi}:kid.alts == 2 && kid.GQ >= 20 \
                && mom.alts == 1 && mom.GQ >= 20 \
                && dad.alts == 1 && dad.GQ >= 20" \
    --info "variant.FILTER == 'PASS' \
                 && INFO.AF_popmax <= 0.005 \
                 && INFO.highest_impact_order < ImpactOrder.synonymous" \
    --pass-only > $intermediate 2> ./slivar_stderr.txt
# 3127 vars passed filtration
# 188825 warnings

However it throws a warning everytime it comes across a missing value, and breaks out of the --info query, which can be seen by switching the order of expressions:

./slivar expr \
    --vcf $vcf --ped $ped \
    --trio "${moi}:kid.alts == 2 && kid.GQ >= 20 \
                && mom.alts == 1 && mom.GQ >= 20 \
                && dad.alts == 1 && dad.GQ >= 20" \
    --info "variant.FILTER == 'PASS' \
                 && INFO.highest_impact_order < ImpactOrder.synonymous \
             && INFO.AF_popmax <= 0.005" \
    --pass-only > $intermediate 2> ./slivar_stderr.txt
# 7 vars passed filtration
# 434 warnings

I was hoping I could use something akin to #74 but inverted:

./slivar expr \
    --vcf $vcf --ped $ped \
    --trio "${moi}:kid.alts == 2 && kid.GQ >= 20 \
                && mom.alts == 1 && mom.GQ >= 20 \
                && dad.alts == 1 && dad.GQ >= 20" \
    --info "variant.FILTER == 'PASS' \
         && INFO.highest_impact_order < ImpactOrder.synonymous \
             && (\!('AF_popmax' in INFO) || INFO.AF_popmax <= 0.005)" \
    --pass-only > $intermediate 2> ./slivar_stderr.txt
# duko.nim(76)             compile
# Error: unhandled exception: SyntaxError: invalid escape (line 1)
# expression was:'variant.FILTER == 'PASS'          && INFO.highest_impact_order < ImpactOrder.synonymous       && (\!('AF_popmax' in INFO) || INFO.AF_popmax <= 0.005)' [ValueError]

But this seems to be a syntax error. Lastly, I tried a "nullish coalescing operator" but this also resulted in a syntax error:

./slivar expr \
    --vcf $vcf --ped $ped \
    --trio "${moi}:kid.alts == 2 && kid.GQ >= 20 \
                && mom.alts == 1 && mom.GQ >= 20 \
                && dad.alts == 1 && dad.GQ >= 20" \
    --info "variant.FILTER == 'PASS' \
                 && INFO.highest_impact_order < ImpactOrder.synonymous \
             && ((INFO.AF_popmax ?? 0.00) <= 0.005)" \
    --pass-only > $intermediate 2> ./slivar_stderr.txt
# this attempts to use the "nullish coalescing operator" but results in:
# Error: unhandled exception: SyntaxError: parse error (line 1)

Is there a way I could successfully make this query? I am a JavaScript novice.

brentp commented 1 year ago

I think you can flip the quotes your first attempt. You had:

    --info "variant.FILTER == 'PASS' \
         && INFO.highest_impact_order < ImpactOrder.synonymous \
             && (\!('AF_popmax' in INFO) || INFO.AF_popmax <= 0.005)" \

you may instead try outer single quotes:

    --info 'variant.FILTER == "PASS" \
         && INFO.highest_impact_order < ImpactOrder.synonymous \
             && (!("AF_popmax" in INFO) || INFO.AF_popmax <= 0.005)' 
brentp commented 1 year ago

by the way, if you annotate with slivar (or echtvar) it will set missing values to -1 to avoid this exact difficulty.

wwgordon commented 1 year ago
  --info 'variant.FILTER == "PASS" \
         && INFO.highest_impact_order < ImpactOrder.synonymous \
           && (!("AF_popmax" in INFO) || INFO.AF_popmax <= 0.005)' 

This works, with double quotes the escaped returns must be removed (i.e. everything must be one line). Perfectly fine for us, thanks!