In silicon prediction tools to bioinformatics pipelines and Scout

fulyataylan commented 6 hours ago

Hi!

Our analysis and the design of Scout, for historical reasons, heavily rely on CADD. However, there are now several new in silico prediction tools available. The advantage of CADD is that it can provide scores for all positions. Yet, there are also tools tailored for specific variant types that perform much better than CADD. I believe we should leverage these new tools. REVEL is one that I particularly like for evaluating missense variants. But I use more than just REVEL for missense variants, so why not include them in the pipeline and add these features to Scout?

Regarding variant annotations, it would be great to have AlphaMissense scores, PrimateAI-3D scores, and UTRAnnotator for rare variants in our bioinformatics pipeline and on Scout.

You can download the PrimateAI-3D scores from the following link, as they're freely available for academic, non-profit research. They're very useful for interpreting missense variants, and I use this tool frequently. https://primateai3d.basespace.illumina.com/download

I use AlphaMissense whenever I have a candidate missense variant, and it would be beneficial to have it available in our pipelines and on Scout.

AlphaMissense predictions are available here: https://alphamissense.hegelab.org/download AlphaMissense (both in hg19 and GRCh38) is also available as VEP plugins: https://github.com/Ensembl/VEP_plugins/blob/release/112/AlphaMissense.pmhttps://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html

As for UTRAnnotator, which I've previously requested for interpreting UTR variants, it's now available as a VEP plugin: https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.htmlhttps://github.com/Ensembl/VEP_plugins/blob/release/112/UTRAnnotator.pm

I hope adding these tools to the annotation pipeline and Scout will be a straightforward and quick process.

Kind regards, Fulya

dnil commented 6 hours ago

Indeed! It is also true that neither of the newer tools have had a disruptive impact, though they do arguably add a slight improvement to each of their respective categories.

If you have a pipeline ready that annotates with these, please list the INFO keys used, or arrange a small sample VCF and we can figure it out and add to Scout.

As for work on common analysis pipelines, please see (and perhaps ping a little at) respective issues: https://github.com/nf-core/raredisease/issues/427, https://github.com/Clinical-Genomics/raredisease/issues/3, https://github.com/Clinical-Genomics/raredisease/issues/9, and wherever the primateAI-one was. 😸

fulyataylan commented 6 hours ago

That's true! The tools are becoming more specialized for their respective variant types, which is excellent. It seems we've learned that a one-size-fits-all approach is no longer sufficient. I know I contacted you a few times about UTRAnnotator, but the good news is that it's now integrated into VEP :)

PrimateAI-3D outperforms its predecessor, PrimateAI. I heard about this at ASHG, where they also mentioned that it surpasses AlphaMissense, which is now the runner-up.

Currently, I don't have a pipeline using these tools. Instead, I analyze the variants on Scout and then manually check them on the respective tools' websites. It's a bit of hands-on work, but it gets the job done!

dnil commented 5 hours ago

Yes, having a VEP plugin is very convenient; makes implementation a lot easier. It's been around since at least 2019 (e g https://github.com/Clinical-Genomics/raredisease/issues/7) - nudge, nudge pipeline devs. 😄

I see, that is interesting about PrimateAI-3D! I hadn't heard; guess it was this or a similar one? https://www.medrxiv.org/content/10.1101/2024.01.12.24301193v1. When it made the rounds last summer it was a bit underwhelming, as was alphamissense, but again, all do add a little bit so we should definitely add them, but doing so to ranking etc will take some balancing - something that is already catered to with CADD scores. Did you contrast to the 1.7 release btw? It has a bit more protein modelling in the mix.

fulyataylan commented 5 hours ago

I know it's much easier with VEP plugins :)

That's right! I checked that publication yesterday as I've been eagerly awaiting its release in some journals since they presented the results at ASHG in November 2023. It's not surprising that it's currently under review somewhere, and they've put it on medRxiv in the meantime.

Their first publication came out before AlphaMissense, so they didn't have that comparison. At ASHG, they presented their expanded cohort size for primate genomes and shared the results from the article you linked.

I still use CADD extensively, but I also incorporate REVEL, PrimateAI-3D, and AM in my interpretation. I even double-check CADD scores on their websites to ensure I have the latest score :)

At this point, I'm not concerned about the ranking and score. I understand that balancing these is challenging and requires significant effort. These tools could initially be available as additional annotations without impacting the ranking.

Clinical-Genomics / scout

In silicon prediction tools to bioinformatics pipelines and Scout #4896