Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
149 stars 46 forks source link

Compound top 20 does not show correct order, and shows "not loaded" #4823

Closed Jakob37 closed 1 week ago

Jakob37 commented 1 week ago

Describe the bug

Our geneticists asked whether the compound top 20 on the SNV/indel page works as intended.

Here is one example:

compound_page

Several of the variants are shown as "not loaded".

They mentioned that they could see higher ranked variants not included in the list.

The variants are not sorted according to their ranking.

Do you guys see this as well? Let me know if not and I'll see if I can help out with pinpointing the issue.

northwestwitch commented 1 week ago

Hi! We also see a lot of not loaded. Thanks for reporting, we'll investigate!

northwestwitch commented 1 week ago

Ah but you have 0 as combined score for all compounds! Are you not ranking all your variants?

northwestwitch commented 1 week ago

We see a situation like this, with the highest combined scores on top:

image
northwestwitch commented 1 week ago

They mentioned that they could see higher ranked variants not included in the list.

And these variants are following the right inheritance model of the variant in question?

Jakob37 commented 1 week ago

Ah but you have 0 as combined score for all compounds! Are you not ranking all your variants?

Aha, interesting. We have "RankScore" assigned to all. Do you have a separate field for the compound score? (Maybe this is in the Scout docs somewhere how it is supposed to look, I'll check ...)

Edit: I see now that it seems to be summing up the scores. All variants have rank scores, so it seems something else is going wrong somewhere. I'll investigate.

We are currently penalizing compounds through the rank model, not summing up their score (which is my interpretation of how the compound score is calculated)...

And these variants are following the right inheritance model of the variant in question?

I'll ask!

northwestwitch commented 1 week ago

Combined score should be the variant's score + compound variant's score. I'm not super familiar with this part of the pipeline but we are using genmod to annotate the compounds.

Jakob37 commented 1 week ago

Combined score should be the variant's score + compound variant's score. I'm not super familiar with this part of the pipeline but we are using genmod to annotate the compounds.

OK, I'll investigate. I suspect the other issues we have here are follow-up issues of us not adding the combined score in the same way ...

Jakob37 commented 1 week ago

OK, I'll see if that is the case in the example above.

dnil commented 1 week ago

I believe you are using a local scoring tool instead of genmod? Perhaps it has been updated and lost the combined compound score?

northwestwitch commented 1 week ago

Check also the score of the compounds, because looks like that combined_score is set as compound score when loading the variant, see this code

Jakob37 commented 1 week ago

Hmm. We run genmod compound as well, in addition to our local tool which updates the RankScore based on compounds.

Looking at the VCFs, the outputs looks a bit different.

Here is our production pipeline:

Compounds=hg002:1_14699_C_G|1_16949_A_C|1_17385_G_A|1_14653_C_T

Here is the output I have from a previous run from the raredisease pipeline:

Compounds=giab_full:chr1_12719_G_C>7|chr1_14354_C_A>8|chr1_17385_G_A>8

Looks like we are missing the >7 info. Which is the scores I guess? Maybe just us who have lagged with updating genmod.

Edit: Or we do some downstream parsing removing it. I'll dig.

dnil commented 1 week ago

This is what it looks like over here:

Screenshot 2024-09-04 at 15 21 00
dnil commented 1 week ago

You are so right, both of you. Here https://github.com/Clinical-Genomics/scout/blob/64a94958dfa1522bebeba5e5553c583e7d01cefb/scout/parse/variant/compound.py#L27 is where the Compounds field gets parsed, and split on ">" before going to that compound_score build routine @northwestwitch pointed to! Remains to be seen why it is removed or not set in the first place in your prod pipe (right now, hopefully).

dnil commented 1 week ago

And I'm very glad I don't have to dig into this every day. It has a legacy touch to it. 😅

Jakob37 commented 1 week ago

And I'm very glad I don't have to dig into this every day. It has a legacy touch to it. 😅

Relatable 😃

It seems actually our trios have the ">4" compound scores, and their score table looks like the ones you show above.

Seems like the difference is that for singles we for them aren't running genmod compound, which I suspect is adding the compound scores. Looks like the solution to run this also for singles.

Also after pondering, I also suspect the "not loaded" means that they have been present in the initial VCF, just not loaded into Scout.

I think this resolves the mystery on our part. Thanks a lot for the help debugging 🐛🙏