Clinical-Genomics / BALSAMIC

Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
https://balsamic.readthedocs.io/
MIT License
44 stars 17 forks source link

Balsamic: List of all uploaded variants in a gene when looking at one variant #596

Open KickiLagerstedt opened 3 years ago

KickiLagerstedt commented 3 years ago

Request to have the same feature as in MIP/SCOUT where all possibly compound variants are uploaded when looking at one variant. Including VAF-data....

image

dnil commented 3 years ago

Absolutely; I think we actually do show them in Scout already, if they are available. I will forward this to Balsamic. Compounds are calculated by Genmod from MIP, and Genmod has recently been added to Balsamic as well, so should be somewhat straightforward.

When we are at it, is there any change you would like in the mode of operation later? First line, we will assume you are happy with linking any second event as a compound, regardless if inherited or somatic, but it might be possible to e.g. consider VAF thresholds or tumor-normal pairing limiting the actual compound call. In that case we will need to update Genmod (@moonso) a bit as well in a second step.

hassanfa commented 3 years ago

Good suggestion. I think getting a score for two-hit might be possible. Currently there is no process in BALSAMIC to look into both BAM and VCF files to check if a tumor-suppressor gene fits a two-hit criterion. I don't think genmod supports it either, as it is agnostic whether a gene is tumor-suppressor or not.

Question: What does compound call represent? And how is it calculated/marked to be shown on Scout? I imagine it is a product pedigree, right?

dnil commented 3 years ago

Right, but it is also effectively implemented permissively, so that all possible models are kept for a variant unless there are observarions against it. So if there is only one proband in the pedigree / one individual in the vcf (say tumor only samples, or subtracted somatic samples) and you run genmod on it, any variant that is in a gene that has another variant in it will have that other variant as a compound, and have the AR_comp model left on it. So basically I'd say you're good to go if you just genmod models, right @moonso? And if you do that before scoring, we could even use that in the rank model..

dnil commented 3 years ago

And for the first part of the question, a compound represents another variant in the same region (defaults to gene) as the one of interest.

dnil commented 3 years ago

..and to be really detailed on the second part (genmod would do it for you), Scout reads the Compounds key from the vcf,

##INFO=<ID=Compounds,Number=.,Type=String,Description="List of compound pairs for this variant.The list is splitted on ',' family id is separated with compoundswith ':'. Compounds are separated with '|'.">

which then on the variant looks like e.g.

Compounds=internal_id:1_156126553_G_T>-8|1_156131050_A_T>-18;
hassanfa commented 3 years ago

I see this as two tracks.

So first part of this is already done, and we can upload research variants to scout (the most recent pr has it implemented).

Rationale: to keep clinical SNV and indels to focus on actionable and known targets. If we see we continuously use compound variants, then we'll move it to clinical.

dnil commented 3 years ago

Super, thats a start. It seems highly relevant for the clinical track, but you know best how to get there. Let me know if/when you have some pilot samples with this enabled and I (and very likely some of the others at KG) can have a look at it!

For the record, I didn't get the part about third party tools? Genmod is in the family and easy to develop or request development for as needed. I also didn't get the part about not using "inheritance" models - is that out of concern for different levels of mosaicism, calls of compoundness without clear evidence of phasing / cis-trans-status or co-cellularity, or something else? But we will probably discuss that more productively in another medium.. 😅

pbiology commented 1 year ago

Refinement 2022-12-13:

pbiology commented 9 months ago

@khurrammaqbool will talk to Daniel and update this issue, including effort, gain and urgency.

dnil commented 9 months ago

I believe you must weigh this against other issues to prioritise your process. I know of more pressing issues, like fixing the broken ranking (adding a family name before the rank numbers) or fully use the loqusdb data for filtering and annotation.

To the question of why adding this one additional command to GenMod in your pipeline, enabling the feature for the Scout users would be useful.

When analysing any recessive cancer gene, which can be surprisingly many, especially for inherited cancer risk syndromes where a first constitutional and a tumor second hit can occur. Also for solid tumors of slightly higher mutation count, there may be two hits in a gene.

Compound annotation with singletons will not work as well as for RD trio cases. Genmod inheritance models are conservatively given, so that all models that could be true are followed. Hence, many variants that are actually in cis will still get a "compound" label since we lack proper phasing for singleton samples. The Scout operator would still be able to more quickly screen the potential high-scoring compounds for a given variant in a recessive gene this way. Especially combined with a good scoring/ranking, this is efficient enough even for singletons.

There are as usual many, many things one could do to improve this and adapt for somatic mutation. The most obvious being allowing tracking of constitutional vs somatic compound pairs, e.g. by directly including normal sample variants, or by respecting some variant tag to the effect, like SOMATIC_SCORE or whatnot. This would require some further development of GenMod. The development is out of the scope of this Issue, we were just curious to hear back.

Don't let any thought process about that discourage you from adding presumably one command to a tool you already have, that would add a couple of tags to the vcf that enable a nice shortcut for Scout users to look at compounds from variantS and variant pages, without having to go do additional searches in the gene.