bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
991 stars 354 forks source link

[404 error] Running GGD recipe: hg38 varpon 20181105 #2923

Closed LimWChing closed 5 years ago

LimWChing commented 5 years ago

Hi, I have installed bcbio through the following line, but it get terminated when it comes to the genome data. In the beginning, the error was the connection to Broad institute's server for the downloading of genome file (overloading of users), but this was resolved after a few tries. Thank you for your previous suggestion to retry! The current problem is that the hg38 varpon returns 404 error. I went to the link for the file and it is apparently unavailable https://nc.hartwigmedicalfoundation.nl/index.php/s/a8lgLsUrZI5gndd/download?path=%2FHMF-Pipeline-Resources&files=GermlineHetPon.hg38.bed.gz. Is there an alternative link to retrieve this file?

The error I received is as below.

[wanching@nscc04 wc]$ bcbio_nextgen.py upgrade --data Upgrading bcbio Data not installed, no genomes provided with '--genomes' flag Upgrade completed successfully. [wanching@nscc04 wc]$ bcbio_nextgen.py upgrade --data --genomes hg38 Upgrading bcbio Upgrading bcbio-nextgen data files List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg38', 'name': 'Human (hg38) full', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149']}], 'genome_indexes': ['rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (hg38) full Running GGD recipe: hg38 varpon 20181105 --2019-08-28 11:34:07-- https://nc.hartwigmedicalfoundation.nl/index.php/s/a8lgLsUrZI5gndd/download?path=%2FHMF-Pipeline-Resources&files=GermlineHetPon.hg38.bed.gz Resolving nc.hartwigmedicalfoundation.nl (nc.hartwigmedicalfoundation.nl)... > gsort version 0.0.6 89.255.203.115 Connecting to nc.hartwigmedicalfoundation.nl (nc.hartwigmedicalfoundation.nl)|89.255.203.115|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2019-08-28 11:34:08 ERROR 404: Not Found.

gzip: stdin: unexpected end of file 2019/08/28 11:34:08 EOF Traceback (most recent call last): File "/home/users/industry/perdana/wanching/scratch/wc/bcbio/bcbio_tooldir/bin/bcbio_nextgen.py", line 221, in install.upgrade_bcbio(kwargs["args"]) File "/home/users/industry/perdana/wanching/scratch/wc/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 106, in upgrade_bcbio upgrade_bcbio_data(args, REMOTES) File "/home/users/industry/perdana/wanching/scratch/wc/bcbio/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 348, in upgrade_bcbio_data args.cores, ["ggd", "s3", "raw"]) File "/scratch/users/industry/perdana/wanching/wc/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354, in install_data_local _prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir) File "/scratch/users/industry/perdana/wanching/wc/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480, in _prep_genomes retrieve_fn(env, manager, gid, idx) File "/scratch/users/industry/perdana/wanching/wc/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 856, in _install_with_ggd ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid) File "/scratch/users/industry/perdana/wanching/wc/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in install_recipe recipe["recipe"]["full"]["recipe_type"], system_install) File "/scratch/users/industry/perdana/wanching/wc/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _run_recipe subprocess.check_output(["bash", run_file]) File "/home/users/industry/perdana/wanching/scratch/wc/bcbio/anaconda/lib/python3.6/subprocess.py", line 336, in check_output **kwargs).stdout File "/home/users/industry/perdana/wanching/scratch/wc/bcbio/anaconda/lib/python3.6/subprocess.py", line 418, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['bash', '/scratch/users/industry/perdana/wanching/wc/bcbio/genomes/Hsapiens/hg38/txtmp/ggd-run.sh']' returned non-zero exit status 1.

roryk commented 5 years ago

Sorry @LimWChing,

I think https://nc.hartwigmedicalfoundation.nl/ suddenly stopped hosting all of the files they had up on there, either that or the files moved. I'm not sure who to contact to figure out if they moved somewhere else or not. We have a bucket that has some of these files on them, let me see if we have them on there.

roryk commented 5 years ago

Ok, I have these files locally, I'll update our bucket and the cloudbiolinux recipe to grab from the bucket instead of here. It will be a little bit as I have to run out but I'll do it by tonight.

roryk commented 5 years ago

Thank you, I fixed this-- let me know if it doesn't work for you. If you try to reinstall, there might be a temporary directory called tmpbcbio-install in the directory you ran the install command. You might need to remove that directory too, as it might have a cloudbiolinux download that has the older, broken ggd-recipe in it.

pdiakumis commented 5 years ago

Rory, you're right, they've moved them to https://nc.hartwigmedicalfoundation.nl/index.php/s/a8lgLsUrZI5gndd?path=%2FHMFTools-Resources for now, but not sure how long that will stay that way. The GermlineHetPon.hg38.bed.gz file is under Amber fyi. Good to know that you're hosting in separate bucket.

roryk commented 5 years ago

Thanks Peter! How did you figure out where they were? I searched around and couldn't find them.

pdiakumis commented 5 years ago

Slack ;-)

roryk commented 5 years ago

haha

LimWChing commented 5 years ago

Thanks a lot Rory and Peter! I removed the tmpbcbio-install directory and did bcbio_nextgen.py upgrade --data --genomes hg38. It works fine now.

roryk commented 5 years ago

Great! Thanks for following up.

alasfar-lina commented 1 year ago

Hallo everyone. The same problem is repeated now.. Exactly, when trying to install the h38 genome


https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.38.gz Auflösen des Hostnamens ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)… 130.14.250.11, 130.14.250.10, 2607:f220:41e:250::7, ... Verbindungsaufbau zu ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)|130.14.250.11|:443 … verbunden. HTTP-Anforderung gesendet, auf Antwort wird gewartet … 404 Not Found 2023-02-09 16:41:02 FEHLER 404: Not Found.


The genome files are no longer there

tdelisper commented 1 year ago

Hello,

As @alasfar-lina has already mentioned the file in https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.38.gz has been removed. Error when trying to install hg38 genome while running cloudbiolinux/ggd-recipes/hg38/dbsnp.yaml