broadinstitute / seqr

web-based analysis tool for rare disease genomics
GNU Affero General Public License v3.0
176 stars 88 forks source link

Variant lookup not behaving as expected #4288

Closed Caustint closed 1 month ago

Caustint commented 1 month ago

Describe the bug When we were reviewing seqr AC during our analyst meeting last week, we noticed a change in behavior on the variant lookup page results. We saw this in multiple variants across multiple RGP families, but I'll use a specific example to illustrate the issue: when you look at the DOCK11 variant in RGP_2209 (link 1 below), it appears that there are 2 homs and an AC of 8 in the seqr callset. However, if you click the seqr link under the variant from that page it navigates to a variant lookup page (link 2 below) that seems to show the variant was found in the het state in only 2 members of Walsh family MC40900. Notably, RGP_2209 is not listed on that variant lookup page, and the variant lookup page shows a different seqr AC and AN than what is displayed in link 1 (also shown in screenshots below). A few weeks ago, the variant lookup page would have shown the info from the RGP family (and probably also the duplicate of the RGP family in the GREGoR callset) so this was unexpected behavior.

Link to page(s) where bug is occurring Link 1: https://seqr.broadinstitute.org/project/R0594_rare_genomes_project_gen/saved_variants/variant/SV0133407_x118610333_f038264_r Link 2: https://seqr.broadinstitute.org/summary_data/variant_lookup?variantId=X-118610333-C-A&genomeVersion=38&sampleType=WGS

Scope of the bug I've only looked at RGP cases thus far

Screenshots image image

hanars commented 1 month ago

@bpblanken can you take a look at X-118610333-C-A in the lookup table an see what is happening there? That variant seems to have a global AC of 2, and in the seqr variant lookup 2 alleles come back in family F026266_mc40900, but then doing a family search in F038264_rgp_2209 also returns that variant with 2 alleles in the family.

bpblanken commented 1 month ago

Did some investigation! I think the bug was an all instead of an any in this migration that was run last week. The consistency of the projects and families seems ok. I'm gonna go back in time a week and make sure that that resolves!

bpblanken commented 1 month ago

(this ticket that I made looks important: https://github.com/broadinstitute/seqr-loading-pipelines/issues/862)

bpblanken commented 1 month ago

I've fixed this (🤞) !

In [4]: hl.query_table('gs://seqr-hail-search-data/v03/GRCh38/SNV_INDEL/annotations.ht', hl.Locus('chrX', 118610333, 'GRCh38')).gt_stats.collect()
Out[4]: [[Struct(AC=8, AN=44660, AF=0.00017913120973389596, hom=2)]]

Process was: