Closed jefarrar closed 8 years ago
Hi @jefarrar
sorry about the issue. I am trying to replicate it. Meanwhile can you go inside /opt/bcbio/genomes/Hsapiens/GRCh37/txtmp
and see if there is any file similar to that name? maybe was a problem downloading the file.
will let you know if I can find the same problem.
you guys are fast!
There is a summary counts zipped file with similar name, but it (along with another .zip archive) looks empty: ls -l /opt/bcbio/genomes/Hsapiens/hg19/txtmp total 8 -rw-rw-r-- 1 xxx xxx 2711 Mar 10 19:19 ggd-run.sh drwxrwxr-x 2 xxx xxx 4096 Mar 10 19:19 srnaseq
ls -l /opt/bcbio/genomes/Hsapiens/hg19/txtmp/srnaseq total 150876 -rw-rw-r-- 1 xxx xxx 1207144 Mar 10 19:16 hairpin.fa.gz -rw-rw-r-- 1 xxx xxx 519390 Mar 10 19:15 hsa.gff3 -rw-rw-r-- 1 xxx xxx 590888 Mar 10 19:16 mature.fa.gz -rw-rw-r-- 1 xxx xxx 0 Mar 10 19:16 miR_Family_Info.txt.zip -rw-rw-rw- 1 xxx xxx 363739 Jun 25 2014 mirna_mature.txt.gz -rw-rw-r-- 1 xxx xxx 2640341 Mar 10 19:16 miRNA.str.gz -rw-rw-r-- 1 xxx xxx 5541832 Mar 6 08:57 refGene.txt.gz -rw-rw-r-- 1 xxx xxx 143401637 Apr 27 2009 rmsk.txt.gz -rw-rw-r-- 1 xxx xxx 0 Mar 10 19:16 Summary_Counts.txt.zip -rw-rw-r-- 1 xxx xxx 16473 Dec 21 06:01 tRNAs.txt.gz -rw-rw-r-- 1 xxx xxx 19933 Oct 3 2010 wgRna.txt.gz
This is on a new centOS install. I upgraded data on an established ubuntu install a few minutes ago and this doesn't seem to be a problem there:
Running GGD recipe: srnaseq Running GGD recipe: prioritize --2016-03-10 18:24:58-- https://s3.amazonaws.com/biodata/coverage/prioritize/prioritize-cancer-hg19-20160215.tar.gz
it seems connection dropped. I would remove that zip file and restart, and cross fingers :) is not failing for me in my computer.
let me know if restarting does something.
Pretty sure that transient connection issues weren't my problem; I've been contorting with this for the past couple of days.
My first thought after finding the empty zip archives was maybe this was a proxy issue in front of the new install. But I was able to manually download the two .zip targets from targetscan.org from behind the proxy without issues. However, this issue persisted even when I manually placed and unpacked in these in the txtmp/srnaseq folder. In any event, I think I've managed to get around this issue by copying newly updated srnaseq folders in from another system.
thanks!
Seems the problem is still there for GGD srnaseq, It's the invalid targetscan version70 data causing the problem. I manually downloaded the version71 file in the folder srnaseq : Summary_Counts.all_predictions.txt, then it's ok, not sure which website hosts the ggd.sh code. Update that code should fix the issue.
Thanks for the report and sorry about the download issues. Could you provide the error message you're seeing? I ran an update and the srnaseq recipe worked cleanly for me. Manually checking the targetscan files, they do seem to be present:
wget http://www.targetscan.org/vert_70/vert_70_data_download/Summary_Counts.all_predictions.txt.zip
Is it possible this was a transient error and re-running fixes the issue?
I'll wait to update to targetscan 71 until @lpantano has a chance to validate those files work correctly with the pipeline.
Thanks for helping to debug.
Thanks for the fast response, Brad!
I was stuck at the ggd several times during installation: subprocess.CalledProcessError: Command '['bash', '~/local/bcbio/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/txtmp/ggd-run.sh']' returned non-zero exit status 2 with empty Summary_Counts.all_predictions.txt. Seems to be a web stability issue. I tried it again just now and version 70 worked as well, with painful speed.
This big file Summary_Counts.all_predictions.txt from targetscan seems to cause network issue with many people. (Probably some problem with targetscan host website, I would say. )
Based on the installation guide, I was using "bcbio_nextgen.py upgrade --tooldir=~/local/bcbio --isolate --genomes GRCh37 --aligners bwa --data" to continue installation, if the installation quit during large data file download due to network issue.
Is it possible to separate the tool installation and data installation in 2 clean parts and provide an easier way to update genomic data for each major components ? (Maybe there is already an easy way for data only in the installation, just I don't know it. )
Thanks!
Thanks for the feedback, I'm glad to hear that it worked in the end. I agree that the download here is pretty slow due to server speed, we'll look at caching a version in s3 to avoid this and any downtimes at targetscan or mirbase.
For upgrades, you can definitely do tools and data separately. You'd want to leave the --tooldir
and --isolate`` arguments out of your command the and then it will skip right to the data installation. In general neither argument should be needed after a successful install since bcbio caches them for future use. You can then upgrade just tools with
--toolsor only data with
--data`. Hope this helps.
Sorry about this. I updated the new version and I will work to create a tar file in s3 to avoid this problem in the future as @chapmanb suggested.
cheers
Thanks a lot for the advice and fast update!
Creating manifest of installed packages in /opt/bcbio/manifest Third party tools upgrade complete. Installing additional tools Upgrading bcbio-nextgen data files Initialized empty Git repository in /opt/tmpbcbio-install/cloudbiolinux/.git/ remote: Counting objects: 12300, done. remote: Compressing objects: 100% (6/6), done. remote: Total 12300 (delta 0), reused 0 (delta 0), pack-reused 12294 Receiving objects: 100% (12300/12300), 7.95 MiB | 3.22 MiB/s, done. Resolving deltas: 100% (7026/7026), done. Setting up virtual machine INFO: <cloudbio.flavor.Flavor instance at 0x7f68f182b170> INFO: <cloudbio.flavor.Flavor instance at 0x7f68f182b170> INFO: This is a ngs_pipeline_minimal flavor INFO: This is a ngs_pipeline_minimal flavor INFO: Distribution auto INFO: Distribution auto INFO: Get local environment INFO: Get local environment INFO: CentOS setup INFO: CentOS setup WARN [distribution.py(216)]: NixPkgs are currently not supported for centos WARN [distribution.py(216)]: NixPkgs are currently not supported for centos DBG [distribution.py]: NixPkgs: Ignored DBG [distribution.py]: NixPkgs: Ignored [localhost] local: echo $HOME [localhost] local: uname -m INFO: Now, testing connection to host... INFO: Now, testing connection to host... INFO: Connection to host appears to work! INFO: Connection to host appears to work! DBG [utils.py]: Expand paths DBG [utils.py]: Expand paths INFO: List of genomes to get (from the config file at '{'install_liftover': False, 'annotation_groups': {'rnaseq': ['transcripts', 'RADAR'], 'smallrna': ['mirbase'], 'variation': ['problem_regions', 'GA4GH_problem_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'cosmic', 'ancestral', 'qsignature']}, 'genome_indexes': ['rtg'], 'genomes': [{'annotations': ['mirbase', 'GA4GH_problem_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', '1000g_snps', 'mills_indels', 'cosmic', 'ancestral', 'qsignature', 'transcripts', 'RADAR'], 'validation': ['giab-NA12878', 'dream-syn3', 'dream-syn4'], 'name': 'Human (GRCh37)', 'dbkey': 'GRCh37', 'annotations_available': ['battenberg', 'dbnsfp']}], 'install_uniref': False}'): Human (GRCh37) INFO: List of genomes to get (from the config file at '{'install_liftover': False, 'annotation_groups': {'rnaseq': ['transcripts', 'RADAR'], 'smallrna': ['mirbase'], 'variation': ['problem_regions', 'GA4GH_problem_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'cosmic', 'ancestral', 'qsignature']}, 'genome_indexes': ['rtg'], 'genomes': [{'annotations': ['mirbase', 'GA4GH_problem_regions', 'MIG', 'prioritize', 'dbsnp', 'hapmap', '1000g_omni_snps', '1000g_snps', 'mills_indels', 'cosmic', 'ancestral', 'qsignature', 'transcripts', 'RADAR'], 'validation': ['giab-NA12878', 'dream-syn3', 'dream-syn4'], 'name': 'Human (GRCh37)', 'dbkey': 'GRCh37', 'annotations_available': ['battenberg', 'dbnsfp']}], 'install_uniref': False}'): Human (GRCh37) Running GGD recipe: srnaseq Traceback (most recent call last): File "/opt/bcbio/bin/bcbio_nextgen.py", line 4, in
import('pkg_resources').run_script('bcbio-nextgen==0.9.6', 'bcbio_nextgen.py')
File "/opt/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.2.2-py2.7.egg/pkg_resources/init.py", line 726, in run_script
File "/opt/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.2.2-py2.7.egg/pkg_resources/init.py", line 1484, in run_script
File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio_nextgen-0.9.6-py2.7.egg-info/scripts/bcbio_nextgen.py", line 207, in
install.upgrade_bcbio(kwargs["args"])
File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 89, in upgrade_bcbio
upgrade_bcbio_data(args, REMOTES)
File "/opt/bcbio/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 257, in upgrade_bcbio_data
cbl_deploy.deploy(s)
File "/opt/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/init.py", line 65, in deploy
_setup_vm(options, vm_launcher, actions)
File "/opt/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/init.py", line 110, in _setup_vm
configure_instance(options, actions)
File "/opt/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/init.py", line 268, in configure_instance
setup_biodata(options)
File "/opt/tmpbcbio-install/cloudbiolinux/cloudbio/deploy/init.py", line 250, in setup_biodata
install_proc(options["genomes"], ["ggd", "s3", "raw"])
File "/opt/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 345, in install_data
_prep_genomes(env, genomes, genome_indexes, ready_approaches)
File "/opt/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 474, in _prep_genomes
retrieve_fn(env, manager, gid, idx)
File "/opt/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py", line 796, in _install_with_ggd
ggd.install_recipe(env.cwd, recipe_file)
File "/opt/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 31, in install_recipe
_move_files(tmpdir, base_dir, recipe["recipe"]["full"]["recipe_outfiles"])
File "/opt/tmpbcbio-install/cloudbiolinux/cloudbio/biodata/ggd.py", line 74, in _move_files
(out_file, tmp_dir))
AssertionError: Did not find expected output file srnaseq/Summary_Counts.all_predictions.txt in /opt/bcbio/genomes/Hsapiens/GRCh37/txtmp