MelbourneGenomics / cpipe

The open source version of the Melbourne Genomics Health Alliance Exome Sequencing Pipeline
Other
33 stars 13 forks source link

numerous problems with installation #210

Open biocyberman opened 7 years ago

biocyberman commented 7 years ago

I am running installation with tag v2.5.1 from dev branch. There are many problems which can cause the installation to quit. I am fixing some of them, will I submit a PR ? Also I can see that many tools can be installed with conda and bioconda, which I find very stable. I will be a good idea to move the installation process to use conda, and even better, make a bioconda package for it.

And in general, the part of installation of cpipe and its dependencies should be separated from the reference data preparation.

multimeric commented 7 years ago

Can you please post an error log from the install process? If the fix is straight forward I'd be happy to accept a PR.

Conda is also a good method of installing these tools, but we would prefer to avoid depending on external services wherever possible in the installation to ensure the installation remains identical.

biocyberman commented 7 years ago

@TMiguelT Thanks for taking a look at this. I am sending a PR. Some more information:

Error from verbatim git clone

git checkout v2.5.1. Then this installation command ./install.sh -s -c10 (without swift and run with 10 cores)

Error:

########################################
TaskError - taskid:download_zlib
PythonAction Error
Traceback (most recent call last):
  File "/bioinfo/cpipe/tools/python/lib/python3.6/site-packages/doit/action.py", line 403, in execute
    returned_value = self.py_callable(*self.args, **kwargs)
  File "/bioinfo/cpipe/tasks/download/download_tools.py", line 13, in action
    download_zip(url, temp_dir, type=type)
  File "/bioinfo/cpipe/tasks/common.py", line 221, in download_zip
    unzip_todir(input, directory, type)
  File "/bioinfo/cpipe/tasks/common.py", line 166, in unzip_todir
    raise ValueError('Can only download .tar.gz, .tar.bz2, or .zip file')
ValueError: Can only download .tar.gz, .tar.bz2, or .zip file

Error after bug fixes

I fixed several errors similar to the one above. The resulting code is in the PR. However, with the modified code, I am still stuck at this error:

./install.sh -s -c 10          
-- generate_pipeline_id
-- copy_main_config
-- copy_config
-- install_bwa
-- install_bzip2
-- install_pcre
-- download_mills_and_1000g
-- download_dbsnp
-- download_chromosome_sizes
-- install_cpanm
-- download_ucsc
-- bwa_index_ucsc_reference
-- download_trio_refinement
-- download_refinement_liftover
Cloning into '/bioinfo/cpipe/tmpdata/tmp8qghs0hp'...
Initialized empty Git repository in /bioinfo/cpipe/tmpdata/tmp_zsmgu0i/.git/
.  download_vcfanno_data
.  download_perl
.  download_vep
.  download_groovy
.  download_fastqc
.  download_picard
.  download_bpipe
.  download_vep_plugins
.  download_java_libs
.  download_xz
.  download_libcurl
.  download_zlib
.  download_maven
.  download_vcfanno
> Loadingremote: Counting objects: 12184, done.
.  install_zlibceiving objects:   1% (122/12184)   
remote: Counting objects: 1103, done.
.  install_groovy:  52% (574/1103), 396.01 KiB | 731.00 KiB/s   8% (975/12184)   
remote: Total 1103 (delta 0), reused 0 (delta 0), pack-reused 1103
Receiving objects: 100% (1103/1103), 667.34 KiB | 1.16 MiB/s, done.
Resolving deltas: 100% (687/687), done.
From https://github.com/Ensembl/VEP_plugins
 * [new branch]      dev          -> origin/dev
 * [new branch]      helptip-typo -> origin/helptip-typo
 * [new branch]      master       -> origin/master
 * [new branch]      release/84   -> origin/release/84
 * [new branch]      release/85   -> origin/release/85
 * [new branch]      release/86   -> origin/release/86
 * [new branch]      release/87   -> origin/release/87
 * [new branch]      release/88   -> origin/release/88
 * [new branch]      release/89   -> origin/release/89
 * [new branch]      release/90   -> origin/release/90
Branch master set up to track remote branch master from origin.
Already on 'master'
HEAD is now at 4b5d3b1 update fields for dbNSFP3.4a
.  install_vep_plugins
.  install_xzects:  13% (1696/12184), 7.63 MiB | 7.41 MiB/s   
:copyDeps
> Building 0%.  install_libcurl

BUILD SUCCESSFUL

Total time: 3.318 secs

This build could be faster, please consider using the Gradle Daemon: https://docs.gradle.org/2.13/userguide/gradle_daemon.html
configure: WARNING: Continuing even with errors mentioned immediately above this line.
.  install_java_libs96% (11697/12184), 34.82 MiB | 13.72 MiB/s   
remote: Total 12184 (delta 0), reused 0 (delta 0), pack-reused 12184
error: index-pack died of signal 7
fatal: index-pack failed
error: fetch-pack died of signal 11
.  install_picard
.  install_vcfanno
configure: WARNING: Cannot find libraries for LDAP support: LDAP disabled
bash: line 4: 40046 Segmentation fault      (core dumped) git clone https://github.com/ssadedin/bpipe /bioinfo/cpipe/tmpdata/tmp8qghs0hp
configure: WARNING: libpsl was not found
ar: `u' modifier ignored since `D' is the default (see `U')
/usr/bin/ar: `u' modifier ignored since `D' is the default (see `U')

Regarding conda

if you mean 'remains identical' is to be reproducible: all of the following tasks can be done reproducibly with conda+bioconda, where you can specify exact versions to install: 'download_perl', 'download_r', 'download_bwa', 'download_htslib', 'download_samtools', 'download_bcftools', 'download_bedtools', 'download_vep', 'download_fastqc', 'download_bpipe', 'download_picard', 'download_perl_libs', 'download_vcfanno',

the following tasks require some coding or manual download (i.e. gatk, but it will change with gatk4) 'download_groovy', 'download_gatk', 'download_vep_libs', 'download_vep_plugins', 'download_java_libs',

I've used conda for a couple of years for both development and production. I find it very reliable.

biocyberman commented 7 years ago

@TMiguelT Michael, I've updated the previous comment to make it more readable and give more information.