brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
247 stars 23 forks source link

Filter on CSQ numeric, pass if missing value #158

Closed wwgordon closed 7 months ago

wwgordon commented 8 months ago

Hi Brent,

This issue closely follows #153 . I have a VEP-created CSQ field that I would like to filter on (SYSERROR_AF_lowCI). As in #153, I would like variants to pass if this field is <= 0.05 OR if the field is missing. I have confirmed that all variants have CSQ fields (though they may be empty, the "slot" is there). I have tried this a number of ways, including the method from #153, using a nullish coalescing operator, and using the unary + to coerce a missing text value to numeric 0:

CSQs(INFO.CSQ, VCF.CSQ, ['SYSERROR_AF_lowCI']).some(function(csq) {return ( !("SYSERROR_AF_LOWCI" in csq) || csq.SYSERROR_AF_LOWCI <= 0.05)}

CSQs(INFO.CSQ, VCF.CSQ, ['SYSERROR_AF_lowCI']).some(function(csq) {return ( (csq.SYSERROR_AF_LOWCI ?? 0.0) <= 0.05)}

CSQs(INFO.CSQ, VCF.CSQ, []).some(function(csq) {return ( +csq.SYSERROR_AF_LOWCI <= 0.05 )}

Regardless of my method, I get thousands of warnings:

[slivar] error evaluating info expression (this can happen if a field is missing): error from duktape: ReferenceError: identifier 'CSQs' undefined for expression:INFO.gnomad_popmax_af < 0.005 && INFO.topmed_af < 0.05 && INFO.genic && (variant.FILTER == "PASS" || variant.FILTER == "SBFilter") && CSQs(INFO.CSQ, VCF.CSQ, ["SYSERROR_AF_lowCI"]).some(function(csq) {return ( !("SYSERROR_AF_LOWCI" in csq) || csq.SYSERROR_AF_LOWCI <= 0.05)})

The number of errors is substantially smaller than my total variant count, so I believe this is only occuring when the SYSERROR_AF_lowCI "slot" is empty.

Do you have any thoughts as to how I might solve this scenario similar to the #153 solution?

Thanks for all the help! William

brentp commented 8 months ago

Hi William, can you share a VCF with header and two variants, one with this CSQ field and one without? You have mixes of case in your examples above so perhaps it's something related to that?

It seems based on the error though that slivar doesn't even know what the CSQs function is so you'll need to pass a file containing that code to --js

wwgordon commented 8 months ago

You have mixes of case in your examples above so perhaps it's something related to that?

I was under the impression that the argument passed at the end of CSQs() (["SYSERROR_AF_lowCI"]) identified the field to be converted to numeric, based on how it is cased in the VCF, while the latter instances of SYSERROR_AF_LOWCI are capitalized because they are keys in the csq object, which are described as always all caps. Running with all caps (including in CSQ()) produces the same errors.

I've attached a VCF (sorry about the huge header) with two variants; the first without and the second with SYSERROR_AF_lowCI. When I run slivar on this file, I do not see any errors, so it seems like it might be something else.

I've also attached a VCF with enough variants to throw the error twice.

slivar_testVCFs_syserror.zip

brentp commented 7 months ago

Sorry for the delay. I have just pushed a change to js/csq.js so you can do:

/slivar_dev expr --pass-only --js js/csq.js \
    --info ' CSQs(INFO.CSQ, VCF.CSQ, ["SYSERROR_AF_LOWCI"]).some(function(csq) {
    var si = csq.SYSERROR_AF_LOWCI; return !isNaN(si) && si < 0.05 }
    )'  -v bamshad_uwcrdr_noonan_wgssr_1.HF.final.VT.vcf.annotated.noStars.head5000  -o x.vcf

Not that you'll need to adjust the query to do exactly what you want. But the change is such that non numeric values will return NaN.

wwgordon commented 7 months ago

I'm having difficulty building the updated slivar from source, is there a slivar_dev binary available?

brentp commented 7 months ago

slivar is unchanged, you just need to download the js/csq.js

wwgordon commented 7 months ago

Oh yes, my apologies, I forgot multiple .js files can be provided with --js.

Thanks!

wwgordon commented 7 months ago

I spoke too soon, --js will only accept one .js file, but I can concatenate them so this isn't a problem.

wwgordon commented 7 months ago

Hi Brent,

Thanks so much for the fix, this seems to be working perfectly.

I was hoping to make a feature request to include either line number or genomic coordinates for variants that throw duktape errors/warnings. I think this could be very helpful for troubleshooting. Would this warrant a separate ticket?

Thanks again, William

brentp commented 7 months ago

yes, a separate ticket would be good though that may be a tall order