Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
150 stars 46 forks source link

Gene variants page (Search SNVs and INDELs) is not usable for cancer variants #3253

Open northwestwitch opened 2 years ago

northwestwitch commented 2 years ago

I've started to take a look at this one. The problem is that at the moment the existing index in database that is used in the query is this:

IndexModel(
            [
                ("hgnc_symbols", ASCENDING),
                ("rank_score", DESCENDING),
                ("category", ASCENDING),
                ("variant_type", ASCENDING),
            ],
            name="hgncsymbol_rankscore_category_varianttype",
            background=True,
            partialFilterExpression={"rank_score": {"$gt": 5}, "category": "snv"},
        ),

The query present on that scout page is collecting snvs with rank score >=5, and that excludes all cancer variants, because they don't have any score at the moment.

How to fix this?

Question, wouldn't be the search be super-slow and eventually time out? 🤔

dnil commented 2 years ago

Right. The ideal is ofcourse that cancer actually starts using the rank model, but I'm kind of guessing that won't happen tomorrow.

The idea would be to make a separate index for category:cancer. Exactly how it should look is precisely the question, and much as when we set up the rare-gene-variant index may require a bit of research into how many would be in the sparse area. Many of the cancer cases have fewer variants, so possibly they could all be indexed, though this is not necessarily always true. If needed we could try to use a lowest-denominator combination of allele frequency, tumor variant allele frequency, functional annotation from the cancer filters to arrive at a baseline of what should clearly be significant to return on a search.