Closed Jakob37 closed 1 week ago
Hi! We also see a lot of not loaded. Thanks for reporting, we'll investigate!
Ah but you have 0 as combined score for all compounds! Are you not ranking all your variants?
We see a situation like this, with the highest combined scores on top:
They mentioned that they could see higher ranked variants not included in the list.
And these variants are following the right inheritance model of the variant in question?
Ah but you have 0 as combined score for all compounds! Are you not ranking all your variants?
Aha, interesting. We have "RankScore" assigned to all. Do you have a separate field for the compound score? (Maybe this is in the Scout docs somewhere how it is supposed to look, I'll check ...)
Edit: I see now that it seems to be summing up the scores. All variants have rank scores, so it seems something else is going wrong somewhere. I'll investigate.
We are currently penalizing compounds through the rank model, not summing up their score (which is my interpretation of how the compound score is calculated)...
And these variants are following the right inheritance model of the variant in question?
I'll ask!
Combined score should be the variant's score + compound variant's score. I'm not super familiar with this part of the pipeline but we are using genmod to annotate the compounds.
Combined score should be the variant's score + compound variant's score. I'm not super familiar with this part of the pipeline but we are using genmod to annotate the compounds.
OK, I'll investigate. I suspect the other issues we have here are follow-up issues of us not adding the combined score in the same way ...
OK, I'll see if that is the case in the example above.
I believe you are using a local scoring tool instead of genmod? Perhaps it has been updated and lost the combined compound score?
Check also the score of the compounds, because looks like that combined_score is set as compound score when loading the variant, see this code
Hmm. We run genmod compound
as well, in addition to our local tool which updates the RankScore based on compounds.
Looking at the VCFs, the outputs looks a bit different.
Here is our production pipeline:
Compounds=hg002:1_14699_C_G|1_16949_A_C|1_17385_G_A|1_14653_C_T
Here is the output I have from a previous run from the raredisease pipeline:
Compounds=giab_full:chr1_12719_G_C>7|chr1_14354_C_A>8|chr1_17385_G_A>8
Looks like we are missing the >7
info. Which is the scores I guess? Maybe just us who have lagged with updating genmod
.
Edit: Or we do some downstream parsing removing it. I'll dig.
This is what it looks like over here:
You are so right, both of you. Here https://github.com/Clinical-Genomics/scout/blob/64a94958dfa1522bebeba5e5553c583e7d01cefb/scout/parse/variant/compound.py#L27 is where the Compounds field gets parsed, and split on ">"
before going to that compound_score
build routine @northwestwitch pointed to! Remains to be seen why it is removed or not set in the first place in your prod pipe (right now, hopefully).
And I'm very glad I don't have to dig into this every day. It has a legacy touch to it. 😅
And I'm very glad I don't have to dig into this every day. It has a legacy touch to it. 😅
Relatable 😃
It seems actually our trios have the ">4" compound scores, and their score table looks like the ones you show above.
Seems like the difference is that for singles we for them aren't running genmod compound
, which I suspect is adding the compound scores. Looks like the solution to run this also for singles.
Also after pondering, I also suspect the "not loaded" means that they have been present in the initial VCF, just not loaded into Scout.
I think this resolves the mystery on our part. Thanks a lot for the help debugging 🐛🙏
Describe the bug
Our geneticists asked whether the compound top 20 on the SNV/indel page works as intended.
Here is one example:
Several of the variants are shown as "not loaded".
They mentioned that they could see higher ranked variants not included in the list.
The variants are not sorted according to their ranking.
Do you guys see this as well? Let me know if not and I'll see if I can help out with pinpointing the issue.