Clinical-Genomics / BALSAMIC

Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
https://balsamic.readthedocs.io/
MIT License
45 stars 16 forks source link

Customer request to modify the --distance param in VEP from 5Kb to 500Kb #1004

Open northwestwitch opened 2 years ago

northwestwitch commented 2 years ago

I'm dropping here a user question/request from the customer support ticket (#372436):

Pertaining to the issue of SVs and specifically BND, we are trying to look for the following recurrent translocation in 3 of our samples analysed in the same ticket #698630. As seen below, the breakpoints can occur ~500kb up or downstream of the genes.

image

From Scout, we are unable to call these BNDs using the “BCL2” gene symbol, mainly because the default setting of the VEP annotation in the pipeline is to annotate ±5kb.

Below is how the BND looks in Scout and therefore was missed because it wasn’t annotated with BCL2.

image

Is it possible to change the default setting of VEP? @hassanfa can maybe explain this better….

northwestwitch commented 2 years ago

5 Kb to 500 Kb is a lot, considering that the distance is applied on both sides of a gene. Wouldn't you end up with a lot of overlapping calls (that seem like to be in genes but aren't?)

hassanfa commented 2 years ago

This is happening for the particular fusion posted above. Zahra and I looked into couple of patients and we see the event, but we don't see the genes being annotated. IGH one luckily included, but BCL2 will always be missed unless it is happening within the gene itself.

I know it is going to be tricky, and possibly include a lot of clutter in the CSQ field so I suggest to add VEP --distance only to SVs from Manta that are BND.

Here is my suggested the solution: in https://github.com/Clinical-Genomics/BALSAMIC/blob/c2fc1cd958e5e5403b08f2bbd98f46b8fcaa3e87/BALSAMIC/constants/workflow_params.py#L141-L143 add fusion_param or something that you'd like. Then split SVDB VCF into each category, for BND ones, add fusion_param to the vep_somatic_sv rule.

hassanfa commented 2 years ago

fyi, in some other fusion events we will see this as well, example: RUNX1-ETV6: image

So it will be beneficial for other customers who are curious to see these types of events. 😃

northwestwitch commented 2 years ago

Another possible way to solve this would be having Scout panels containing not only genes, but chromosomal intervals (we've been discussing about this since forever but haven't found a solution yet). It's always in the back of our mind tho (https://github.com/Clinical-Genomics/scout/issues/1907).

Time to start thinking about this a but more seriously?

hassanfa commented 2 years ago

I think so, fusions need to be taken a bit more seriously. Looking at VCFs, they are easy to identify for a trained eye but tricky on Scout without gene annotation. Intervals can work, or a predefined genic regions might work as well (if it is done before uploading to Scout).

dnil commented 1 year ago

Another option might be a fusion annotation on the DNA side, noting if conceptual gene adjacency changed with the event.

dnil commented 1 year ago

E.g. SnpEff has some support fusion gene annotation. I haven't used it for a looong time. It still seems maintained though: https://github.com/pcingola/SnpEff. Some other customer seemed to like it for this purpose: https://github.com/Clinical-Genomics/BALSAMIC/issues/771.