arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
318 stars 118 forks source link

Integrate predictions from dbNSFP. #244

Open arq5x opened 10 years ago

arq5x commented 10 years ago

"It compiles prediction scores from four new and popular algorithms (SIFT, Polyphen2, LRT, and MutationTaster)"

lbeltrame commented 10 years ago

As a user of dbNSFP through snpEff, I would really welcome this, as snpEff uses the whole thing uncompressed (!), making annotation a nightmare.

This would also be useful for chapmanb/bcbio-nextgen#146

dgaston commented 10 years ago

In the current GitHub version you can at least keep your existing dbNSFP annotations provided by snpSift, since you can load (and retrieve) the whole INFO field. If you are using GEMINI through the API it makes it even better as the INFO field is retrieved as an OrderedDict for convenience.

pcingola commented 10 years ago

Just a comment: Upcoming SnpEff 3.5 annotates using tabix-indexed dbNSFP. You can try the development version I've just uploaded:

http://sourceforge.net/projects/snpeff/files/snpEff_development.zip

Make sure you get the tabix-indexed dbNSFP files:

http://sourceforge.net/projects/snpeff/files/databases/dbNSFP2.3.txt.gz http://sourceforge.net/projects/snpeff/files/databases/dbNSFP2.3.txt.gz.tbi

I'm uploading now, so it may take a few hours for SourceForge to sync.

Pablo

arq5x commented 10 years ago

Thanks much for the update, Pablo!

lbeltrame commented 10 years ago

Thanks a lot, Pablo (downloading and testing now)!

dgaston commented 10 years ago

Excellent! I'll grab the development version and start testing it out in my pipeline. Thanks for putting the work in of TABIXing those files.

pcingola commented 10 years ago

Hi, I've fixed a couple of bugs and uploaded a new version, same URL.