Illumina / Nirvana

The nimble & robust variant annotator
https://illumina.github.io/NirvanaDocumentation/
GNU General Public License v3.0
170 stars 44 forks source link

Plans for upcoming cache data updates? #38

Closed WestleyASherman closed 3 years ago

WestleyASherman commented 3 years ago

It's been very excellent to have frequent public updates to Nirvana this year, particularly the 3.11.1 official release, thank you!

The automated download, which is very convenient, includes Ensembl VEP v91 from 2017. Since the Nirvana source code is available, it's also possible to use more recent versions of VEP, which is now up to v101. But since the Nirvana has such excellent testing and verification, it's also good to be able to use Nirvana directly with minimal modifications.

I don't suppose there are any upcoming plans for Nirvana 3.11+ to support more recent versions of Ensembl VEP directly without modifications, perhaps even included in the automated download cache data?

MichaelStromberg commented 3 years ago

Westley! Man do we miss you here at Illumina.

We're in the process of decoupling our data import from VEP. This way we can update our gene models much more frequently. Eventually the goal will be to transition to monthly updates of gene models and gene symbols. In the future, we'd like to have gene symbols updated daily.

WestleyASherman commented 3 years ago

It's very good to have those details, thank you! Such frequent updates will be really excellent. The possibility of daily updates in the future hadn't even occurred to me but, now that I think about it, with the right automation it should be possible. The Nirvana team at Illumina is doing extraordinarily useful and exciting work!

smrgit commented 3 years ago

if you don't mind, can I piggy-back on this issue and ask if it is possible to update to a new version of VEP manually? (while sticking with the older GRCh37, tho) @MichaelStromberg, above you mentioned decoupling from VEP -- does that mean you will stop using VEP in Nirvana or that updates to the dataSources will be decoupled from software updates?

MichaelStromberg commented 3 years ago

Nirvana doesn't use VEP per se, but we do import gene models from the VEP data files. I.e. we import the RefSeq and Ensembl transcripts from VEP, but not external data sources like dbSNP, gnomAD, etc. Moving forward we will grab RefSeq and Ensembl content directly from the source.

WestleyASherman commented 3 years ago

Informally, the code changes needed to use Nirvana with VEP data up to v99 were relatively minor, just a few dozen lines, but using v100+ VEP data would require more significant code changes (Nirvana officially uses v91 and the latest v102).

smrgit commented 3 years ago

thank you very much for that additional information, very helpful -- any rough idea on how when you will shift to grabbing RefSeq and Ensembl content directly from those sources? and whether you will jump to the most current data at that time?

MichaelStromberg commented 3 years ago

Since there's only a short amount of time before the Winter break, it looks unlikely that we will release it in December. My thinking is that we're going to roll out support for bringing your own reference + GFF file in January. There were other users that were asking to use Nirvana with non-human species and that will enable them.

We're going to refine that work in February to allow a user to bring in transcript alignments from BAM files to handle RefSeq transcripts that differ from the reference genome.

When we do this, we will grab the latest transcripts from both RefSeq and Ensembl (no point in using something older).

olingerc commented 2 years ago

@MichaelStromberg sorry for continuing on a closed ticket. I've read through all the tickets in github and the issue about database updates is a recurring question. For my part the answers are not quite clear in my mind yet so please forgive me for asking again the question asked in this issue: I'm on 3.16 and also realized that VEP is quite old. You were talking about changing the transcript annotation mechanism. Has this been done in a more recent version? Also, I can nowhere find the differences between the 3.1x and 3.2 branches. I see you are working on a 3.18 release, will this replace 3.2.6?