chapmanb / cloudbiolinux

CloudBioLinux: configure virtual (or real) machines with tools for biological analyses
http://cloudbiolinux.org
MIT License
257 stars 158 forks source link

Broad FTP login error is causing bcbio-nextgen data upgrade to fail #329

Closed mjsteinbaugh closed 4 years ago

mjsteinbaugh commented 4 years ago

See related issue filed over at bcbio-nextgen: https://github.com/bcbio/bcbio-nextgen/issues/3021

Looks like the gsapubftp login might need to be updated?

Here's the relevant code: https://github.com/chapmanb/cloudbiolinux/search?q=gsapubftp&unscoped_q=gsapubftp

mjsteinbaugh commented 4 years ago

See related GATK thread: https://gatkforums.broadinstitute.org/gatk/discussion/1215/how-can-i-access-the-gsa-public-ftp-server

chapmanb commented 4 years ago

Mike; This is the error message that the Broad FTP server reports when it's overloaded. I don't believe anything has changed, it's just that it doesn't handle high loads very well. We don't have a good fix for this short of hosting ourselves so the best recommendation is to retry and hopefully it'll be less stressed and work cleanly. Hope this helps.

mjsteinbaugh commented 4 years ago

Thanks Brad, I thought that might be what was going on. I wasn't sure though because I could log into the FTP server with an external client and list the files; I was only seeing errors with the Python script.

mjsteinbaugh commented 4 years ago

A couple of days later, I'm still seeing this error:

Running GGD recipe: hg38 1000g_snps 20160105
--2019-11-26 08:44:02--  ftp://gsapubftp-anonymous:*password*@ftp.broadinstitute.org/bundle/hg38//1000G_p$
ase1.snps.high_confidence.hg38.vcf.gz
           => ‘variation/1000G_phase1.snps.high_confidence.vcf.gz’
Resolving ftp.broadinstitute.org (ftp.broadinstitute.org)... 69.173.70.223
Connecting to ftp.broadinstitute.org (ftp.broadinstitute.org)|69.173.70.223|:21... connected.
Logging in as gsapubftp-anonymous ...
Login incorrect.
roryk commented 4 years ago

Hi Mike,

Sorry about that, there isn't much we can do about it unfortunately. It should eventually work when the servers stop being overwhelmed.

mjsteinbaugh commented 4 years ago

Following up, on a fresh bcbio-nextgen 1.1.9 install, I'm still seeing this:

--2019-12-13 09:40:53--  ftp://gsapubftp-anonymous:*password*@ftp.broadinstitute.org/bundle/hg38//Mills_and_1000G_
gold_standard.indels.hg38.vcf.gz.tbi
           => ‘variation/Mills_and_1000G_gold_standard.indels.vcf.gz.tbi’
Resolving ftp.broadinstitute.org (ftp.broadinstitute.org)... 69.173.70.223
Connecting to ftp.broadinstitute.org (ftp.broadinstitute.org)|69.173.70.223|:21... connected.
Logging in as gsapubftp-anonymous ...
Login incorrect.
Upgrading bcbio
Upgrading bcbio-nextgen data files
List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg38', 'name': 'Human (hg38) full', 'inde
xes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion
-blacklist', 'ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG
56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcri
pts', 'varpon', 'vcfanno', 'viral'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-geno
me-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA
12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149']}, {'dbkey': 'mm10', 'name': 'Mouse (mm10)', 'indexes': ['s
eq', 'twobit'], 'annotations': ['transcripts', 'rmsk', 'problem_regions', 'dbsnp', 'vcfanno']}, {'dbkey': 'rn6', '
name': 'Rat (rn6)', 'indexes': ['seq', 'twobit'], 'annotations': ['transcripts']}, {'dbkey': 'canFam3', 'name': 'D
og (canFam3)', 'indexes': ['twobit'], 'annotations': ['transcripts', 'dbsnp']}, {'dbkey': 'BDGP6', 'name': 'D mela
ngogaster (BDGP6)', 'indexes': ['seq'], 'annotations': ['transcripts']}], 'genome_indexes': ['bowtie2', 'rtg', 'st
ar'], 'install_liftover': False, 'install_uniref': False}'): Human (hg38) full, Mouse (mm10), Rat (rn6), Dog (canF
am3), D melangogaster (BDGP6)
Running GGD recipe: hg38 seq 1000g-20150219_1
Moving on to next genome prep method after trying ggd
GGD recipe not available for hg38 bowtie2
Downloading genome from s3: hg38 bowtie2
Moving on to next genome prep method after trying s3
No pre-computed indices for hg38 bowtie2
Preparing genome hg38 with index bowtie2
Moving on to next genome prep method after trying ggd
GGD recipe not available for hg38 rtg
Downloading genome from s3: hg38 rtg
Moving on to next genome prep method after trying s3
No pre-computed indices for hg38 rtg
Preparing genome hg38 with index rtg
Moving on to next genome prep method after trying ggd
GGD recipe not available for hg38 star
Downloading genome from s3: hg38 star
Moving on to next genome prep method after trying s3
No pre-computed indices for hg38 star
Preparing genome hg38 with index star
hg38 detected, building a simple reference with no alts, decoys or HLA from /data00/n/app/bcbio/1.1.9/install/geno
mes/Hsapiens/hg38/seq/hg38.fa to /data00/n/app/bcbio/1.1.9/install/genomes/Hsapiens/hg38/seq/hg38-simple.fa.
Preparing STAR index from /data00/n/app/bcbio/1.1.9/install/genomes/Hsapiens/hg38/seq/hg38-simple.fa.
Removing /data00/n/app/bcbio/1.1.9/install/genomes/Hsapiens/hg38/seq/hg38-simple.fa.
Running GGD recipe: hg38 transcripts 2018-10-10_92
Running GGD recipe: hg38 RADAR v2-20180202
Running GGD recipe: hg38 rmsk 20180319
Running GGD recipe: hg38 salmon-decoys 94
Running GGD recipe: hg38 fusion-blacklist 2
Running GGD recipe: hg38 ccds r20
Running GGD recipe: hg38 capture_regions 20161202
Running GGD recipe: hg38 coverage 2018-10-16
Running GGD recipe: hg38 prioritize 20181227
Running GGD recipe: hg38 dbsnp 151-20180418
Running GGD recipe: hg38 hapmap_snps 20160105
Running GGD recipe: hg38 1000g_omni_snps 20160105
Running GGD recipe: hg38 ACMG56_genes 20160726
Running GGD recipe: hg38 1000g_snps 20160105
Running GGD recipe: hg38 mills_indels 20160105
Traceback (most recent call last):                                                                       [3/97495]
  File "/n/app/bcbio/stable/tools/bin/bcbio_nextgen.py", line 228, in <module>
    install.upgrade_bcbio(kwargs["args"])
  File "/n/app/bcbio/1.1.9/install/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 106, in upgrade_bcbio
    upgrade_bcbio_data(args, REMOTES)
  File "/n/app/bcbio/1.1.9/install/anaconda/lib/python3.6/site-packages/bcbio/install.py", line 369, in upgrade_bcbio_data
    args.cores, ["ggd", "s3", "raw"])
  File "/mnt/resource/tmp.1YauZ8ZyEP/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 354,
in install_data_local
    _prep_genomes(env, genomes, genome_indexes, ready_approaches, data_filedir)
  File "/mnt/resource/tmp.1YauZ8ZyEP/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 480,
in _prep_genomes
    retrieve_fn(env, manager, gid, idx)
  File "/mnt/resource/tmp.1YauZ8ZyEP/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 872,
in _install_with_ggd
    ggd.install_recipe(os.getcwd(), env.system_install, recipe_file, gid)
  File "/mnt/resource/tmp.1YauZ8ZyEP/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 30, in in
stall_recipe
    recipe["recipe"]["full"]["recipe_type"], system_install)
  File "/mnt/resource/tmp.1YauZ8ZyEP/bcbio/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 62, in _r
un_recipe
    subprocess.check_output(["bash", run_file])
  File "/n/app/bcbio/1.1.9/install/anaconda/lib/python3.6/subprocess.py", line 336, in check_output
    **kwargs).stdout
  File "/n/app/bcbio/1.1.9/install/anaconda/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bash', '/data00/n/app/bcbio/1.1.9/install/genomes/Hsapiens/hg38/txtmp/gg
d-run.sh']' returned non-zero exit status 6.
roryk commented 4 years ago

Thanks, it will be random that this happens unfortunately.