Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
444 stars 151 forks source link

Additional Filtering Using Patient-Matched RNA BAM File #685

Open DarioS opened 4 years ago

DarioS commented 4 years ago

Labelling a mutation as a Missense Mutation suggests that it is expressed as a protein, so would need to be detectable at the RNA level as a pre-requisite. Could the software allow the user to input a RNA-seq BAM file of the same patient sample for which the VCF file was created from normal and tumour DNA-seq data and filter out SNVs and indels in genes that aren't even transcribed in the particular patient? Nonsense mutations would need to be treated as a special case, because they would often be degraded via Nonsense Mediated Decay, so the variant wouldn't be seen in the RNA-seq data for biological reasons. Also, such an integration of DNA and RNA could be used for VEP to output the variant for the most highly expressed transcript in the sample, rather than all isoforms of a gene (many or all which might not be expressed in that sample) or the most serious effect (could be for the unexpressed isoform), as it presently done.

aparton commented 4 years ago

Hi Dario,

Thank you for the suggestion. I’ll chat to the team about it later this week and get back to you with our thoughts.

Kind Regards, Andrew

On 3 Feb 2020, at 07:00, Dario Strbenac notifications@github.com wrote:

 Labelling a mutation as a Missense Mutation suggests that it is expressed as a protein, so would need to be detectable at the RNA level as a pre-requisite. Could the software allow the user to input a RNA-seq BAM file of the same patient sample for which the VCF file was created from normal and tumour DNA-seq data and filter out SNVs and indels in genes that aren't even transcribed in the particular patient? Nonsense mutations would need to be treated as a special case, because they would often be degraded via Nonsense Mediated Decay, so the variant wouldn't be seen in the RNA-seq data for biological reasons. Also, such an integration of DNA and RNA could be used for VEP to output the variant for the most highly expressed transcript in the sample, rather than all isoforms of a gene (many or all which might not be expressed in that sample) or the most serious effect (could be for the unexpressed isoform), as it presently done.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

DarioS commented 4 years ago

I'm looking forward to it. I know it's an ambitious feature request.