epi2me-labs / wf-human-variation

Other
86 stars 41 forks source link

Annotating common variants with SnpSift annotate within wf-human-variation workflow #157

Closed litte024 closed 2 weeks ago

litte024 commented 3 months ago

Is your feature related to a problem?

We are trying to use wf-human-variation to find pathogenic variants in nanopore sequencing data of patients that have already had whole exome sequencing but were unable to get genetic diagnoses. Many of these pathogenic variants are thus in introns, and either aren't reported or don't have any entry in ClinVar, and so they don't show up in the ClinVar variant annotations section of the output. Additionally, the ClinVar annotations are sometimes wrong, or miss variants. For example, several entries lead to a "404 Not Found" error in ClinVar (ClinVar entry 1712957, ClinVar entry 1712964, ClinVar entry 1713037). Several entries have a ClinVar Significance in the final report that isn't the same as what's listed on the ClinVar site. For example, ClinVar entry 14666 says it's "Pathogenic, protective" in the snp report. However, on the ClinVar site, it's classified as benign. We also have a case where we know a patient has a single pathogenic variant, but it didn't show up in the ClinVar variant annotations section. This patient has NM_000016.6(ACADM):c.985A>G (p.Lys329Glu), which is classified as Pathogenic/Likely pathogenic in ClinVar, but this variant isn't present in the ClinVar variant annotations. The variant is correctly called in the .vcf, but it wasn't properly annotated.

This has all made the identification of potential pathogenic variants quite difficult, when we were hoping that this script would improve our workflow so we wouldn't have to manually look through the genome to find potential pathogenic variants.

Describe the solution you'd like

Since all of our potential pathogenic variants will be rare/uncommon variants, we'd like to be able to annotate common variants so we can filter them out. Since this workflow already annotates variants with SnpEff, we'd like it if the option to annotate variants based on allele frequency was available within this workflow. On the SnpEff page (https://pcingola.github.io/SnpEff/snpeff/introduction/), it says "Common variants (dbSnp) | Annotating "common" variants from dbSnp and 1,000 Genomes can be easily done (see SnpSift annotate)."

Describe alternatives you've considered

We could use SnpEff and annotate the allele common variants in the output .vcf file ourselves; however, since SnpEff is already within this workflow, it would be more convenient if this workflow was able to do so.

Additional context

No response

vlshesketh commented 3 months ago

Hi @litte024, thank you for reporting this. The version of the ClinVar VCF used in the workflow will be updated shortly, which should give you access to up-to-date annotations. Regarding your question about annotating common variation using dbSNP, I have created a ticket internally to review how we manage annotation sources within the workflow and will look into adding this functionality.

litte024 commented 3 months ago

@vlshesketh Hi, thank you for looking into this. Has there been any update to the ClinVar VCF version yet? And is there any word as to if/when the dbSNP annotation will be added? Thanks again!

vlshesketh commented 3 months ago

Hi @litte024 - the ClinVar update has now been released (https://github.com/epi2me-labs/wf-human-variation/releases/tag/v2.1.0). We are continuing to review annotations and I'll update here once we have more information on whether this is something we can implement.

vlshesketh commented 2 weeks ago

Hi @litte024, sorry for the delay in responding - as we have some upcoming integrations with tertiary analysis partners planned for this workflow via the EPI2ME desktop application, we have no plans at the moment to incorporate any additional annotation resources directly into the workflow.