broadinstitute / seqr

web-based analysis tool for rare disease genomics
GNU Affero General Public License v3.0
176 stars 88 forks source link

Custom search across GRCh38 projects not working #3897

Closed Caustint closed 5 months ago

Caustint commented 7 months ago

Describe the bug I use custom search to screen our cohort for interesting variants in candidate genes that come up in various settings. Attempting the search linked below (any affected, coding variants, <0.1% in gnomAD, and then location set to a specific gene) results in an internal server error even after multiple attempts.

Link to page(s) where bug is occurring https://seqr.broadinstitute.org/report/custom_search/58311365c3844063eed681b44d19f87f?page=1&sort=position

Scope of the bug I've tried other genes and had the same result. Selecting either GRCh38 or GRCh37 projects both failed, though GRCh37 had a much longer error: https://seqr.broadinstitute.org/report/custom_search/3a183bdafeb519516667d47eb3859488?page=1&sort=position, screenshot below. I usually only bother with the GRCh38 projects, just tried 37 to test out all the options. If I limit the search to a single project (like RGP), it does work.

Screenshots image

hanars commented 7 months ago

@bpblanken the underlying error in the hail backend for this search is raised when joining several of the GRCh38 MITO project tables:

TypeError: 'or_else' requires the 'a' and 'b' arguments to have the same type
    a: type 'array<array<struct{contamination: str, DP: int32, HL: float64, mito_cn: int32, GQ: float64, GT: call, sampleId: str, sampleType: str, individualGuid: str, familyGuid: str, affected_id: int32, is_male: bool}>>'
    b: type 'array<array<struct{contamination: float64, DP: int32, HL: float64, mito_cn: int32, GQ: float64, GT: call, sampleId: str, sampleType: str, individualGuid: str, familyGuid: str, affected_id: int32, is_male: bool}>>'

Looks like some tables have contamination as str and some have it as float64. The hail backend doesn't filter or parse that field it just passes it through so we wouldn't thrown an error on it for a single project search (or single family search, as family tables are never joined with each other), its only an issue when trying to do a join. Its displayed in a hover in the UI as

`Contamination (${genotype.contamination}) > 0`

so I guess we are expecting a numeric value but also would not throw any sort of error if we got a string.

Can you do a manual check of the MITO project tables to see what went wrong?

bpblanken commented 7 months ago

Yeah, I don't know what's happened here (investigating).

hanars commented 7 months ago

Okay that bug is now fixed but there is a second performance bug preventing this search from completing, which I will look into

hanars commented 7 months ago

Still requires optimizations in order to complete: 1) Do not include MITO data if no MITO genes 2) Adjust prefilter join optimization to be after inheritance filtering

stephditroia commented 6 months ago

Came to submit a similar ticket, so I'll add here.

I'm unable to run a simple custom search (single gene dominant search, absent from gnomad, high impact high quality variants). Tried 10 times with minor search adjustments and always get a 504 error. https://seqr.broadinstitute.org/report/custom_search/12e3b194639fe20eb2ca106dcd32afd5?page=1&sort=pathogenicity

hanars commented 5 months ago

These searches are now all working! I am so sorry for the delay on this and really appreciate everyones patience as we worked this out. At tis pojnt we expect all the searches to work, so please submit a new ticket if you do find anything thats not working