bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
991 stars 354 forks source link

bcbio_nextgen data upgrades when trying to download snpEff_v4_3_GRCh38.99.zip #3603

Closed pvanheus closed 2 years ago

pvanheus commented 2 years ago

Version info

To Reproduce Exact bcbio command you have used:

bcbio_nextgen.py upgrade -u stable --data

This was after already running bcbio_nextgen.py upgrade -u stable --tools. I am trying to follow the instructions here.

Your yaml configuration file:

Since this is an upgrade run, I'm listing install-params.yaml

aligners:
- bwa
- hisat2
- minimap2
- rtg
datatarget:
- variation
- rnaseq
- smallrna
- gemini
genomes:
- hg38
- hg19
isolate: true
tooldir: /tools/software/bcbio_tools
toolplus: []

Log files (could be found in work/log) Please attach (10MB max): bcbio-nextgen-commands.log, and bcbio-nextgen-debug.log.

TOOL UPGRADE COMPLETED
--2022-01-18 09:08:27--  https://raw.githubusercontent.com/bcbio/bcbio-nextgen/master/requirements-conda.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20 [text/plain]
Saving to: ‘bcbio-update-requirements.txt’

     0K                                                       100%  812K=0s

2022-01-18 09:08:29 (812 KB/s) - ‘bcbio-update-requirements.txt’ saved [20/20]

# All requested packages already installed.

--2022-01-18 09:10:16--  https://snpeff.blob.core.windows.net/databases/v4_3/snpEff_v4_3_GRCh38.99.zip
Resolving snpeff.blob.core.windows.net (snpeff.blob.core.windows.net)... 52.239.234.228
Connecting to snpeff.blob.core.windows.net (snpeff.blob.core.windows.net)|52.239.234.228|:443... connected.
HTTP request sent, awaiting response... 404 The specified blob does not exist.
2022-01-18 09:10:19 ERROR 404: The specified blob does not exist..

Upgrading bcbio
Upgrade of bcbio-nextgen code complete.
Upgrading bcbio-nextgen data files
List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg19', 'name': 'Human (hg19)', 'indexes': ['seq', 'twobit'], 'annotations': ['GA4GH_problem_regions', 'capture_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', 'clinvar', 'cosmic', 'ancestral', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'purecn_mappability', 'simple_repeat', 'af_only_gnomad', 'transcripts', 'RADAR', 'rmsk', 'fusion-blacklist', 'mirbase', 'esp', 'exac', 'gnomad_exome', '1000g'], 'validation': ['giab-NA12878', 'platinum-genome-NA12878', 'giab-NA24385', 'giab-NA24631', 'giab-NA24143', 'giab-NA24149']}, {'dbkey': 'hg38', 'name': 'Human (hg38) full', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'purecn_mappability', 'simple_repeat', 'af_only_gnomad', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion-blacklist', 'mirbase', 'esp', 'exac', 'gnomad_exome'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695']}], 'genome_indexes': ['bwa', 'hisat2', 'minimap2', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (hg19), Human (hg38) full
Installing snpEff database GRCh38.99 in /tools/software/bcbio/genomes/Hsapiens/hg38/snpeff
Traceback (most recent call last):
  File "/tools/software/bcbio/anaconda/bin/bcbio_nextgen.py", line 228, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/tools/software/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 109, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/tools/software/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 363, in upgrade_bcbio_data
    _upgrade_snpeff_data(galaxy_home, args, remotes)
  File "/tools/software/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 440, in _upgrade_snpeff_data
    subprocess.check_call(["wget", "--no-check-certificate", "-c", "-O", dl_file, dl_url])
  File "/tools/software/bcbio/anaconda/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['wget', '--no-check-certificate', '-c', '-O', 'snpEff_v4_3_GRCh38.99.zip', 'https://snpeff.blob.core.windows.net/databases/v4_3/snpEff_v4_3_GRCh38.99.zip']' returned non-zero exit status 8.
naumenko-sa commented 2 years ago

Hi @pvanheus !

Thanks for reporting and sorry about the issue!

It looks like you have been upgrading from the older bcbio version. bcbio1.2.9 is a major upgrade - python3.7, a lot of changes in the conda environments. Could you please try to install bcbio code and tools from scratch and re-use the genomes folder?

Since it is a common issues, I've put a note in the docs: https://bcbio-nextgen.readthedocs.io/en/latest/contents/installation.html#upgrade

See how install a second bcbio and link the reference data: https://bcbio-nextgen.readthedocs.io/en/latest/contents/development.html#bcbio-dev-installation

Sergey