chapmanb / cloudbiolinux

CloudBioLinux: configure virtual (or real) machines with tools for biological analyses
http://cloudbiolinux.org
MIT License
257 stars 158 forks source link

Makes sure that gnomad genome and exome vcfs are normalized for hg19, hg38, and grch37 #280

Closed naumenko-sa closed 5 years ago

naumenko-sa commented 5 years ago

When vcf's from gnomad exome/genome are installed, they need to be normalized, otherwise later vcfanno is not able to match some indels and annotates high frequency indels with low frequency in general population. Solves bcbio/bcbio-nextgen#2503

chapmanb commented 5 years ago

Sergey -- thanks much for this improvement. I appreciate you resolving this issue and also continuing to make gnomad useful in bcbio.

pfpjs commented 5 years ago

Hi @naumenko-sa, @chapmanb,

Thank you both, this is great for removing unannotated (due to different representations) high-frequency variants!

How can I trigger the download of these updated resources? Right now, I deleted gnomad,2.0.1 and gnomad_exome:2.0.1 from each genomes' versions.csv file, but I think this is a bit obscure and there might be a better way to update existing installations.

Thank you again for this improvement!

chapmanb commented 5 years ago

Paulo; Deleting it from versions.csv is the right thing to do for this change. Practically we normally update the version to force a new install, but I agree with Sergey's approach to not do that here since the gnomad re-installs are so big. I don't want to automatically trigger this on everyone's system if they're happy with the previous setups. Thanks again for the feedback and discussion.