bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
981 stars 355 forks source link

snpEff database download url for BDGP6 genome has changed #3664

Open mjsduncan opened 2 years ago

mjsduncan commented 2 years ago

installing fruit fly genome BDGP6 fails with missing snpEff download for bcbio_nextgen version 1.2.9 this url works: https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_BDGP6.28.99.zip

mjsduncan commented 2 years ago

a hack to keep data upgrades from crashing is to unzip the download and rename the folder to BDGP6.86 in the BDGP6/snpeff folder

mjsduncan commented 1 year ago

the above hack fails to fool snpeff and snpeff crashes when it figures out the file is an imposter:

subprocess.CalledProcessError: Command 'set -o pipefail; unset JAVA_HOME && export PATH=/home/mozi/biosrc/bcbio/anaconda/bin:"$PATH" &&  \
/home/mozi/biosrc/bcbio/anaconda/bin/snpEff -Xms3g -Xmx29g -Djava.io.tmpdir=/mnt/dm/tmp/tmpr5ibsuso/tmp eff  \
-dataDir /home/mozi/biosrc/bcbio/genomes/Dmelanogaster/BDGP6/snpeff -hgvs -noLog -i vcf -o vcf  \
-csvStats /mnt/dm/flies/work/call1/work/joint/gatk-haplotype-joint/3x5/3x5-joint-effects-stats.csv  \
-s /mnt/dm/flies/work/call1/work/joint/gatk-haplotype-joint/3x5/3x5-joint-effects-stats.html  \
BDGP6.86 /mnt/dm/flies/work/call1/work/joint/gatk-haplotype-joint/3x5/3x5-joint.vcf.gz |  \
/home/mozi/biosrc/bcbio/anaconda/bin/bgzip --threads 11 -c > /mnt/dm/tmp/tmpr5ibsuso/3x5-joint-effects.vcf.gz
java.lang.RuntimeException: Property: 'BDGP6.86.genome' not found
    at org.snpeff.interval.Genome.<init>(Genome.java:104)
    at org.snpeff.snpEffect.Config.readGenomeConfig(Config.java:693)
    at org.snpeff.snpEffect.Config.readConfig(Config.java:661)
    at org.snpeff.snpEffect.Config.init(Config.java:487)
    at org.snpeff.snpEffect.Config.<init>(Config.java:121)
    at org.snpeff.SnpEff.loadConfig(SnpEff.java:449)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:939)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:923)
    at org.snpeff.SnpEff.run(SnpEff.java:1188)
    at org.snpeff.SnpEff.main(SnpEff.java:168)
' returned non-zero exit status 255.
mwhamgenomics commented 1 year ago

I experienced similar snpEff download problems for GRCh38. I tried manually downloading the right version at first, but then found that there's a metadata file at, in my case, genomes/Hsapiens/hg38/seq/hg38-resources.yaml:

version: 47
aliases:
  human: true
  snpeff: GRCh38.99
  ensembl: homo_sapiens_merged_vep_100_GRCh38

...

I was able to change the '99' to a version that works, e.g. 92, and the install ran successfully. Maybe this'll work for other genomes/species too?

naumenko-sa commented 1 year ago

Hi Murray @mwhamgenomics !

Your snpEff might be not the latest one.

snpEff --version

snpEff 5.0 has GRCh38.99 as a database is that is what we are using in bcbio1.2.9

snpEff databases | grep sapiens

Likely, you'd need to update to bcbio1.2.9/snpeff5.0 or please describe your issue in more detail.

SN

naumenko-sa commented 1 year ago

As to the original issue, @mjsduncan !

Thanks for catching - sorry I missed it then. I've fixed the resource file: https://github.com/bcbio/bcbio-nextgen/blob/master/config/genomes/BDGP6-resources.yaml#L5

For it to work a bcbio devel update first is needed, since it deploys the resource file.

SN