Closed serge2016 closed 5 years ago
Hi,
the more complex installation instructions for a local command line VEP version are in the installation script which are run with perl INSTALL.pl
.
Hi, the more complex installation instructions for a local command line VEP version are in the installation script which are run with
perl INSTALL.pl
.
But according to this line in Dockerfile both installations use INSTALL.pl and Dockerfile installation is much more complex then regular.
Hi,
The Dockerfile looks more complex for several reasons:
INSTALL.pl
script in order to remove non used part of some libraries. As you may know, each layer of the Docker image adds to the final size of the image. That's why we install HTSlib and other libraries earlier in the Dockerfile.
Furthermore, we also use the Docker multi-stage strategy for the same size reduction aim.To summarise, we made the VEP Dockerfile installation more complex than the traditional VEP installation for Docker image size purpose.
Best regards, Laurent
@ens-lgil, thank you! So if I don't care about the image size, I can use the simple installation instruction?
Dear @serge2016,
Yes, you can use the basic installation instruction from the INSTALL.pl
script.
I forgot to mention that the Docker image installation is also more complex because it installs some extra libraries (ensembl-xs and bioperl-xs for instance) which make VEP runs bit faster: https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#faster. However you can install them after the VEP installation if you want.
Best regards, Laurent
I am closing this issue, but if you have any more questions please feel free to reopen it.
Best regards, Laurent
@ens-lgil
We were trying to incorporate all 'speed ups' (but not space savings) and have created this Dockerfile
:
FROM ubuntu:14.04
LABEL maintainer="serge2016"
# Author: Serge I. Mitrofanov.
# LastUpdate: 21.06.2019 22:20.
# Tool type: mutation annotator
# Contents:
# VEP 96.3 - Apache License, Version 2.0 + Wildtype plugin from pVACtools.
# ensembl-xs v2.3.2 - Apache License, Version 2.0.
# samtools - MIT/Expat License.
# picard - MIT License.
# Input:
# VCF file
# Output:
# VEP-annotated input VCF-file
ENV DEBIAN_FRONTEND="noninteractive"
RUN apt-get update && apt-get --yes --force-yes --no-install-recommends install \
build-essential \
pkg-config \
software-properties-common \
ncurses-dev \
curl \
wget \
nano \
time \
tcsh \
gawk \
bzip2 \
pigz \
zip \
unzip \
xz-utils \
mc \
htop \
iotop \
git-core \
subversion \
python \
python-tk \
python-dev \
python-setuptools \
openssh-client \
openssl \
libssl-dev \
libyaml-dev \
zlib1g-dev \
libbz2-dev \
liblzma-dev \
libffi-dev \
libxml2-dev \
libxslt1-dev \
libpq-dev \
realpath
ENV TZ="Europe/Moscow"
RUN echo $TZ > /etc/timezone \
&& dpkg-reconfigure tzdata
ENV TMPDIR="/tmp"
RUN mkdir -p "$TMPDIR"
ENV SOFT="/soft"
RUN mkdir -p "$SOFT"
# memUsage (both python 2 & 3) (Olga)
# psutil >= 2.2.1 (Tested with 5.6.1 - ok; 1.2.1 - err) - additional python package required for memUsage. That's why apt install python-psutil doesn't fit on Ubuntu 14.04
RUN cd "$SOFT" \
&& git clone https://github.com/giampaolo/psutil.git \
&& cd "$SOFT/psutil" \
&& python setup.py install \
&& cd "$SOFT" \
&& rm -r "$SOFT/psutil" \
&& mkdir -p "$SOFT/memusage/bin" \
&& wget -q "https://raw.githubusercontent.com/ozolotareva/housekeeping-scr/master/memUsage.py" -O - | tr -d '\r' > "$SOFT/memusage/bin/memUsage.py" \
&& chmod +x "$SOFT/memusage/bin/memUsage.py"
ENV MEMUSAGE="$SOFT/memusage/bin/memUsage.py" \
PATH="$SOFT/memusage/bin:$PATH"
# cmake 3.14.5
RUN cd $SOFT \
&& wget -q "https://cmake.org/files/v3.14/cmake-3.14.5-Linux-x86_64.sh" -O "$SOFT/cmake-3.14.5-Linux-x86_64.sh" \
&& sh $SOFT/cmake-3.14.5-Linux-x86_64.sh --prefix="$SOFT" --include-subdir --skip-license
ENV PATH="$SOFT/cmake-3.14.5-Linux-x86_64/bin:$PATH"
# java8
RUN add-apt-repository -y ppa:openjdk-r/ppa \
&& apt-get update \
&& apt-get --yes --force-yes --no-install-recommends install \
openjdk-8-jre
ENV _JAVA_OPTIONS="-Djava.io.tmpdir=$TMPDIR"
# FastQC v0.11.8
RUN cd "$SOFT" \
&& wget -q "http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zip" -O "$SOFT/fastqc_v0.11.8.zip" \
&& unzip -q "$SOFT/fastqc_v0.11.8.zip" \
&& mv "$SOFT/FastQC" "$SOFT/FastQC_v0.11.8" \
&& chmod +x "$SOFT/FastQC_v0.11.8/fastqc" \
&& rm "$SOFT/fastqc_v0.11.8.zip"
ENV FASTQC="$SOFT/FastQC_v0.11.8/fastqc"
# samtools 1.9
RUN cd "$SOFT" \
&& wget -q "https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2" -O "$SOFT/samtools-1.9.tar.bz2" \
&& tar xjf "$SOFT/samtools-1.9.tar.bz2" \
&& mv "$SOFT/samtools-1.9" "$SOFT/samtools-1.9-src" \
&& cd "$SOFT/samtools-1.9-src" \
&& make -j"$(($(nproc)+1))" prefix="$SOFT/samtools-1.9" install \
&& cd "$SOFT/samtools-1.9-src/htslib-1.9" \
&& make -j"$(($(nproc)+1))" prefix="$SOFT/htslib-1.9" install \
&& cd "$SOFT" \
&& rm -r "$SOFT/samtools-1.9-src" \
&& rm "$SOFT/samtools-1.9.tar.bz2"
ENV SAMTOOLS="$SOFT/samtools-1.9/bin/samtools" \
BGZIP="$SOFT/htslib-1.9/bin/bgzip" \
TABIX="$SOFT/htslib-1.9/bin/tabix" \
PATH="$SOFT/samtools-1.9/bin:$SOFT/htslib-1.9/bin:$PATH" \
LD_LIBRARY_PATH="$SOFT/htslib-1.9/lib:$LD_LIBRARY_PATH"
# picard 2.19.2
# TODO: remove '-Dpicard.useLegacyParser=false' option from all picard commands after full transition to new syntax: https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
RUN cd "$SOFT" \
&& wget -q "https://github.com/broadinstitute/picard/releases/download/2.19.2/picard.jar" -O "$SOFT/picard-2.19.2.jar"
ENV PICARD="$SOFT/picard-2.19.2.jar"
# perl & mysql
RUN apt-get --yes --force-yes --no-install-recommends install \
libdbd-mysql-perl \
libmysqlclient-dev \
libpng-dev \
uuid-dev
RUN apt-get clean \
&& rm -rf /var/lib/apt/lists
ENV BRANCH="release/96" \
MACHTYPE="x86_64"
# https://metacpan.org/pod/App::cpanminus#INSTALLATION
RUN cd "$SOFT" \
&& mkdir -p "$SOFT/cpanm/bin" \
&& curl -s -S --insecure -L "https://cpanmin.us/" -o "$SOFT/cpanm/bin/cpanm" \
&& chmod +x "$SOFT/cpanm/bin/cpanm"
ENV PATH="$SOFT/cpanm/bin:$PATH"
# htslib-1.9 SOURCES + COMPILED
RUN cd "$SOFT" && \
wget -q "https://github.com/samtools/htslib/archive/1.9.zip" -O "$SOFT/htslib-1.9.zip" && \
unzip -q "$SOFT/htslib-1.9.zip" && \
mv "$SOFT/htslib-1.9" "$SOFT/htslib-1.9-src" && \
cd "$SOFT/htslib-1.9-src" && \
make -j"$(($(nproc)+1))" && \
rm "$SOFT/htslib-1.9.zip"
ENV HTSLIB_DIR="$SOFT/htslib-1.9-src"
# bioperl-live
# ensembl-vep/travisci/get_dependencies.sh: 1/4. BioPerl-live. Only keep the bioperl-live "Bio" library, that is temporary needed for faster installation of Bio-DB-HTS and cpanm packages.
RUN cd "$SOFT" && \
wget -q "https://github.com/bioperl/bioperl-live/archive/release-1-7-2.zip" -O "$SOFT/bioperl-live-release-1-7-2.zip" && \
unzip -q "$SOFT/bioperl-live-release-1-7-2.zip" && \
mkdir -p "$SOFT/bioperl-live" && \
mv "$SOFT/bioperl-live-release-1-7-2/Bio" "$SOFT/bioperl-live/" && \
rm -r "$SOFT/bioperl-live-release-1-7-2" && \
rm "$SOFT/bioperl-live-release-1-7-2.zip"
# ATTENTION!!! Here should be 'ENV PERL5LIB="$PERL5LIB:$SOFT/bioperl-live"' command, but if it is here we are not able to remove this path from it later!
# That's why using export in every RUN later...
# Bio::DB::HTS
# ensembl-vep/travisci/get_dependencies.sh: 3/4. Bio::DB::HTS
# ensembl-vep/travisci/build_c.sh (Install/compile more libraries): 2/3. Bio::DB::HTS
RUN cd "$SOFT" && \
export PERL5LIB="$PERL5LIB:$SOFT/bioperl-live" && \
# git clone --branch master --depth 1 https://github.com/Ensembl/Bio-DB-HTS.git && \
wget -q "https://github.com/Ensembl/Bio-DB-HTS/archive/3.01.zip" -O "$SOFT/Bio-DB-HTS-3.01.zip" && \
unzip -q "$SOFT/Bio-DB-HTS-3.01.zip" && \
mv "$SOFT/Bio-DB-HTS-3.01" "$SOFT/Bio-DB-HTS-3.01-src" && \
cd "$SOFT/Bio-DB-HTS-3.01-src" && \
# By default (without --prefix) installs '/usr/local/lib/perl/5.18.2/Bio/DB/HTS.pm'
# 'HTSLIB_DIR' env variable is used
perl Build.PL --prefix "$SOFT/Bio-DB-HTS-3.01" && \
./Build && \
#./Build test
./Build install && \
cd "$SOFT" && \
rm -r "$SOFT/Bio-DB-HTS-3.01-src" && \
rm "$SOFT/Bio-DB-HTS-3.01.zip"
# https://www.ensembl.org/info/docs/api/api_installation.html
ENV PERL5LIB="$PERL5LIB:$SOFT/Bio-DB-HTS-3.01/lib/perl/5.18.2"
# ensembl-variation C code: Compile Variation LD C scripts
RUN cd "$SOFT" && \
export PERL5LIB="$PERL5LIB:$SOFT/bioperl-live" && \
git clone --branch $BRANCH --depth 1 https://github.com/Ensembl/ensembl-variation.git && \
cd ensembl-variation/C_code && \
# 'HTSLIB_DIR' env variable shoud be defined and should point to a directory with htslib sources + compiled, where an 'htslib` subdir with *.h files is and where hts.* compiled files are.
make -j"$(($(nproc)+1))" && \
# Copy 2 binaries to ../bin (this path is hardcoded into makefile)
make install
ENV PATH="$SOFT/ensembl-variation/bin:$PATH" \
PERL5LIB="$PERL5LIB:$SOFT/ensembl-variation/modules"
# ensembl-io
RUN cd "$SOFT" && \
git clone --branch $BRANCH --depth 1 "https://github.com/Ensembl/ensembl-io.git"
ENV PERL5LIB="$PERL5LIB:$SOFT/ensembl-io/modules"
# bioperl-ext, faster alignments for haplo (XS-based BioPerl extensions to C libraries)
RUN cd "$SOFT" && \
export PERL5LIB="$PERL5LIB:$SOFT/bioperl-live" && \
git clone "https://github.com/bioperl/bioperl-ext.git" && \
cd bioperl-ext && \
git checkout -b branch180924 73138e9f26b9cb6321288bff4fe2516e862aa975 && \
cd Bio/Ext/Align && \
# Update bioperl-ext Makefile.PL with '-fPIC'
perl -pi -e"s|(cd libs.+)CFLAGS=\\\'|\$1CFLAGS=\\\'-fPIC |" Makefile.PL && \
# Installing a folder 'lib/perl/5.18.2/auto/Bio/Ext/' with files 'Align/Align.so' & 'Align.pm' into PREFIX
perl Makefile.PL PREFIX="$SOFT/bioperl-ext_Bio-Ext-Align_180924" && \
make -j"$(($(nproc)+1))" && \
make -j"$(($(nproc)+1))" install && \
cd "$SOFT" && \
rm -r "$SOFT/bioperl-ext"
ENV PERL5LIB="$PERL5LIB:$SOFT/bioperl-ext_Bio-Ext-Align_180924/lib/perl/5.18.2"
# ensembl-xs, faster run using re-implementation in C of some of the Perl subroutines
RUN cd "$SOFT" && \
export PERL5LIB="$PERL5LIB:$SOFT/bioperl-live" && \
wget -q "https://github.com/Ensembl/ensembl-xs/archive/2.3.2.zip" -O "$SOFT/ensembl-xs-2.3.2.zip" && \
unzip -q "$SOFT/ensembl-xs-2.3.2.zip" && \
mv "$SOFT/ensembl-xs-2.3.2" "$SOFT/ensembl-xs-2.3.2-src" && \
cd "$SOFT/ensembl-xs-2.3.2-src" && \
perl Makefile.PL PREFIX="$SOFT/ensembl-xs-2.3.2" && \
make -j"$(($(nproc)+1))" && \
make -j"$(($(nproc)+1))" install && \
cd $SOFT && \
rm -r "$SOFT/ensembl-xs-2.3.2-src" && \
rm "$SOFT/ensembl-xs-2.3.2.zip"
ENV PERL5LIB="$PERL5LIB:$SOFT/ensembl-xs-2.3.2/lib/perl/5.18.2"
# kent-383_base (jksrc)
# ensembl-vep/travisci/get_dependencies.sh: 4/4. jksrc
# ensembl-vep/travisci/build_c.sh (Install/compile more libraries): 3/3. kent src & Build
ENV KENT_SRC="$SOFT/kent-383_base/src"
ENV MACHTYPE="x86_64"
RUN cd "$SOFT" && \
export PERL5LIB="$PERL5LIB:$SOFT/bioperl-live" && \
wget -q "https://github.com/ucscGenomeBrowser/kent/archive/v383_base.zip" -O "$SOFT/kent-383_base.zip" && \
unzip -q "$SOFT/kent-383_base.zip" && \
# Only keep needed kent-383_base libraries for VEP (Serge: src/hg is about 300 MB)
rm -r "$SOFT/kent-383_base/java" "$SOFT/kent-383_base/python" && \
rm -r "$SOFT/kent-383_base/src/hg" && \
# mv kent-383_base kent-383_base_bak && mkdir -p kent-383_base/src && \
# cp -R kent-383_base_bak/confs kent-383_base/ && \
# cp -R kent-383_base_bak/src/lib kent-383_base_bak/src/inc kent-383_base_bak/src/jkOwnLib kent-383_base/src/ && \
# cp kent-383_base_bak/src/*.sh kent-383_base/src/ && \
# rm -rf kent-383_base_bak
#MACHTYPE="$(uname -m)" && \
MYSQLINC="$(mysql_config --include | sed -e 's/^-I//g')" && \
MYSQLLIBS="$(mysql_config --libs)" && \
#export MACHTYPE && \
export MYSQLINC && \
export MYSQLLIBS && \
export CFLAGS="-fPIC" && \
echo "Making kent [1/2] ..." && \
cd "$KENT_SRC/lib" && \
echo 'CFLAGS="-fPIC"' > ../inc/localEnvironment.mk && \
make clean && \
make -j"$(($(nproc)+1))" && \
echo "Making kent [2/2] ..." && \
cd ../jkOwnLib && \
make clean && \
make -j"$(($(nproc)+1))" && \
ln -s $KENT_SRC/lib/x86_64/* $KENT_SRC/lib/ && \
rm "$SOFT/kent-383_base.zip"
# TODO: Add anything that will use kent!
# Bio::DB::BigFile v1.07
RUN cd "$SOFT" && \
wget -q "https://cpan.metacpan.org/authors/id/L/LD/LDS/Bio-BigFile-1.07.tar.gz" -O "$SOFT/Bio-BigFile-1.07.tar.gz" && \
tar xzf "$SOFT/Bio-BigFile-1.07.tar.gz" && \
mv "$SOFT/Bio-BigFile-1.07" "$SOFT/Bio-BigFile-1.07-src" && \
cd "$SOFT/Bio-BigFile-1.07-src" && \
# 'KENT_SRC' & 'MACHTYPE' should be defined
perl Build.PL --prefix "$SOFT/Bio-BigFile-1.07" && \
./Build && \
./Build install && \
rm -r "$SOFT/Bio-BigFile-1.07-src" && \
rm "$SOFT/Bio-BigFile-1.07.tar.gz"
ENV PERL5LIB="$PERL5LIB:$SOFT/Bio-BigFile-1.07/lib/perl/5.18.2"
## A lot of cleanup on the imported libraries, in order to reduce the docker image ##
#rm -rf Bio-HTS/.??* Bio-HTS/Changes Bio-HTS/DISCLAIMER Bio-HTS/MANIFEST* Bio-HTS/README Bio-HTS/scripts Bio-HTS/t Bio-HTS/travisci \
# bioperl-ext/.??* bioperl-ext/Bio/SeqIO bioperl-ext/Bio/Tools bioperl-ext/Makefile.PL bioperl-ext/README* bioperl-ext/t bioperl-ext/examples \
# ensembl-vep/.??* ensembl-vep/docker \
# ensembl-xs/.??* ensembl-xs/Changes ensembl-xs/INSTALL ensembl-xs/MANIFEST ensembl-xs/README ensembl-xs/t ensembl-xs/travisci \
# htslib/.??* htslib/INSTALL htslib/NEWS htslib/README* htslib/test && \
# # Install htslib binaries (need bgzip, tabix)
# WORKDIR $HTSLIB_DIR
# RUN make install && rm -f Makefile *.c cram/*.c
# VEP 96.3 (+ Wildtype.pm plugin from pVAC-Seq) (including cpanm dependancies)
RUN cd "$SOFT" && \
export PERL5LIB="$PERL5LIB:$SOFT/bioperl-live" && \
# Clone ensembl-vep git repository
git clone --branch $BRANCH --depth 1 https://github.com/Ensembl/ensembl-vep.git && \
mv "$SOFT/ensembl-vep" "$SOFT/ensembl-vep-96.3" && \
# Get ensemb cpanfile
wget -q "https://raw.githubusercontent.com/Ensembl/ensembl/$BRANCH/cpanfile" -O "$SOFT/ensembl_cpanfile" && \
echo "Installing cpanm packages [1/3]: ensembl perl dependencies ..." && \
$SOFT/cpanm/bin/cpanm --local-lib $SOFT/cpanm --installdeps --with-recommends --notest --cpanfile ensembl_cpanfile . && \
echo "Installing cpanm packages [2/3]: ensembl-vep perl dependencies ..." && \
$SOFT/cpanm/bin/cpanm --local-lib $SOFT/cpanm --installdeps --with-recommends --notest --cpanfile ensembl-vep-96.3/cpanfile . && \
echo "Installing cpanm packages [3/3] ..." && \
# $SOFT/cpanm/bin/cpanm --notest --local-lib $SOFT/cpanm Archive::Zip Module::Build Bio::Perl DBI DBD::mysql Set::IntervalTree JSON PerlIO::gzip Scalar::Util Try::Tiny && \
$SOFT/cpanm/bin/cpanm --local-lib $SOFT/cpanm --with-recommends --notest Archive::Zip && \
# Delete bioperl after the cpanm installs as it will be reinstalled by the INSTALL.pl script
rm -r "$SOFT/bioperl-live" && \
# Removing bioperl-like from PERL5LIB
PERL5LIB="$(echo "$PERL5LIB" | awk -v RS=: -v ORS=: '/\/bioperl-live/ {next} {print}')" && \
# Run INSTALL.pl and remove the ensemb-vep tests and travis
export PERL5LIB="$PERL5LIB:$SOFT/cpanm/lib/perl5:$SOFT/cpanm/lib/perl5/x86_64-linux-gnu-thread-multi" && \
cd "$SOFT/ensembl-vep-96.3" && \
# --NO_TEST
perl INSTALL.pl --NO_UPDATE --NO_HTSLIB -a ap -s homo_sapiens -y GRCh37,GRCh38 -g ProteinSeqs,Downstream,Conservation,GO && \
# pVACtools Wildtype plugin, unchanged since 02.11.2017 till 19.06.2019: https://github.com/griffithlab/pVACtools/commit/f6099b390363e3b0bd0f93be9a8380b9139009dc
wget -q "https://raw.githubusercontent.com/griffithlab/pVACtools/475f8cb91403a4819cda68a330337bbc463d2e4c/tools/pvacseq/VEP_plugins/Wildtype.pm" -O "$HOME/.vep/Plugins/Wildtype.pm"
ENV VEP="$SOFT/ensembl-vep-96.3/vep" \
VEPFILTER="$SOFT/ensembl-vep-96.3/filter_vep" \
VEPINSTALL="$SOFT/ensembl-vep-96.3/INSTALL.pl" \
CONVERTCACHE="$SOFT/ensembl-vep-96.3/convert_cache.pl --bgzip $BGZIP --tabix $TABIX" \
PERL5LIB="$PERL5LIB:$SOFT/cpanm/lib/perl5:$SOFT/cpanm/lib/perl5/x86_64-linux-gnu-thread-multi:$SOFT/ensembl-vep-96.3:$SOFT/ensembl-vep-96.3/modules" \
PATH="$SOFT/ensembl-vep-96.3:$PATH"
COPY common_funcs.sh /usr/local/bin/
COPY vep.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/vep.sh
WORKDIR /outputs
ENTRYPOINT ["vep.sh"]
kent-383_base
(there was an older version in your scripts) is used? Oh, now I see, that $KENT_SRC
is used by Bio::DB::BigFile v1.07
...I'm trying to test the installation.
perl -e "use Bio::DB::HTS::Tabix"
- ok!
$ perl -e "use Bio::DB::BigFile"
Can't load '/soft/Bio-BigFile-1.07/lib/perl/5.18.2/auto/Bio/DB/BigFile/BigFile.so' for module Bio::DB::BigFile: /soft/Bio-BigFile-1.07/lib/perl/5.18.2/auto/Bio/DB/BigFile/BigFile.so: undefined symbol: tbx_readrec at /usr/lib/perl/5.18/DynaLoader.pm line 184.
at -e line 1.
Compilation failed in require at -e line 1.
BEGIN failed--compilation aborted at -e line 1.
# perl -Imodules t/AnnotationSource_File_BigWig.t
ok 1 - use Bio::EnsEMBL::VEP::AnnotationSource::File;
ok 2 # skip Bio::DB::BigFile module not available
ok 3 # skip Bio::DB::BigFile module not available
ok 4 # skip Bio::DB::BigFile module not available
ok 5 # skip Bio::DB::BigFile module not available
ok 6 # skip Bio::DB::BigFile module not available
ok 7 # skip Bio::DB::BigFile module not available
ok 8 # skip Bio::DB::BigFile module not available
ok 9 # skip Bio::DB::BigFile module not available
ok 10 # skip Bio::DB::BigFile module not available
ok 11 # skip Bio::DB::BigFile module not available
ok 12 # skip Bio::DB::BigFile module not available
ok 13 # skip Bio::DB::BigFile module not available
ok 14 # skip Bio::DB::BigFile module not available
ok 15 # skip Bio::DB::BigFile module not available
1..15
# perl -e "print \"@INC\"" | tr " " "\n"
/soft/Bio-DB-HTS-3.01/lib/perl/5.18.2
/soft/ensembl-variation/modules
/soft/ensembl-io/modules
/soft/bioperl-ext_Bio-Ext-Align_180924/lib/perl/5.18.2
/soft/ensembl-xs-2.3.2/lib/perl/5.18.2
/soft/Bio-BigFile-1.07/lib/perl/5.18.2
/soft/cpanm/lib/perl5/x86_64-linux-gnu-thread-multi
/soft/cpanm/lib/perl5
/soft/cpanm/lib/perl5/x86_64-linux-gnu-thread-multi
/soft/ensembl-vep-96.3
/soft/ensembl-vep-96.3/modules
/etc/perl
/usr/local/lib/perl/5.18.2
/usr/local/share/perl/5.18.2
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.18
/usr/share/perl/5.18
/usr/local/lib/site_perl
Dear @serge2016,
Yes, Bio::DB::BigFile
needs the kemt-355_base
library. Regarding the issue above, it looks like Bio::DB::BigFile
wasn't correcly installed on your Docker image. This is likely due to some missing or not correctly setup ENV
.
I will have a look but it might take a bit of time
We don't really have scripts to benchmark these speed improvements. However the speed ups listed here have been tested few years ago when VEP had been rewritten. You can still compare the running time of your Docker image with the official VEP Docker image to check the speed up of you Docker image.
As a side note, did you try to build your Docker image based on Ensembl VEP ? e.g.:
FROM ensemblorg/ensembl-vep:latest
RUN apt-get update && apt-get -y install \
pkg-config \
software-properties-common \
ncurses-dev \
wget \
nano \
time \
tcsh \
gawk \
bzip2 \
pigz \
zip \
xz-utils \
mc \
htop \
iotop \
git-core \
subversion \
python \
python-tk \
python-dev \
python-setuptools \
openssh-client \
libyaml-dev \
zlib1g-dev \
libbz2-dev \
liblzma-dev \
libffi-dev \
libxml2-dev \
libxslt1-dev \
libpq-dev \
realpath
...
Best regards, Laurent
Dear @serge2016,
I had a closer look at you Dockerfile.
I think the Dockerfile could be simplified in several places, e.g.:
ensembl-io
and ensembl-variation
cpanminus
packageBio::DB::BigFile
Furthermore you are using different version of some libraries:
bioperl-live
version 1.6.924Bio::DB::BigFile
requires kent
version 335ubuntu:16.04
or even ubuntu:18.04
(slightly faster) as base image.Finally, bear in mind that adding the caches (GRCh37 and GRCh38) within your Docker image is going to make it huge (>40Gb: each cache is over 10Gb and each fasta is about 8Gb). We recommend to install the caches outside the Docker container and then mount the directory to the container: https://www.ensembl.org/info/docs/tools/vep/script/vep_download.html#docker.
Here is an example of Dockerfile with some of the changes I listed above (not fully tested):
FROM ubuntu:16.04
LABEL maintainer="serge2016"
# Author: Serge I. Mitrofanov.
# LastUpdate: 21.06.2019 22:20.
# Tool type: mutation annotator
# Contents:
# VEP 96.3 - Apache License, Version 2.0 + Wildtype plugin from pVACtools.
# ensembl-xs v2.3.2 - Apache License, Version 2.0.
# samtools - MIT/Expat License.
# picard - MIT License.
# Input:
# VCF file
# Output:
# VEP-annotated input VCF-file
ENV DEBIAN_FRONTEND="noninteractive"
RUN apt-get update && apt-get --yes --force-yes install \
build-essential \
pkg-config \
software-properties-common \
ncurses-dev \
curl \
wget \
nano \
time \
tcsh \
gawk \
bzip2 \
pigz \
zip \
unzip \
xz-utils \
mc \
htop \
iotop \
git-core \
subversion \
python \
python-tk \
python-dev \
python-setuptools \
perl \
perl-base \
openssh-client \
openssl \
libssl-dev \
libyaml-dev \
zlib1g-dev \
libbz2-dev \
liblzma-dev \
libffi-dev \
libxml2-dev \
libxslt1-dev \
libpq-dev \
realpath \
libdbd-mysql-perl \
libmysqlclient-dev \
libpng-dev \
uuid-dev \
cpanminus && \
apt-get clean && rm -rf /var/lib/apt/lists
ENV TZ="Europe/Moscow"
RUN echo $TZ > /etc/timezone \
&& dpkg-reconfigure tzdata
ENV TMPDIR="/tmp"
RUN mkdir -p "$TMPDIR"
ENV SOFT="/soft"
RUN mkdir -p "$SOFT"
ENV BRANCH="release/96"
ENV MACHTYPE="x86_64"
# memUsage (both python 2 & 3) (Olga)
# psutil >= 2.2.1 (Tested with 5.6.1 - ok; 1.2.1 - err) - additional python package required for memUsage. That's why apt install python-psutil doesn't fit on Ubuntu 14.04
RUN cd "$SOFT" \
&& git clone https://github.com/giampaolo/psutil.git \
&& cd "$SOFT/psutil" \
&& python setup.py install \
&& cd "$SOFT" \
&& rm -r "$SOFT/psutil" \
&& mkdir -p "$SOFT/memusage/bin" \
&& wget -q "https://raw.githubusercontent.com/ozolotareva/housekeeping-scr/master/memUsage.py" -O - | tr -d '\r' > "$SOFT/memusage/bin/memUsage.py" \
&& chmod +x "$SOFT/memusage/bin/memUsage.py"
ENV MEMUSAGE="$SOFT/memusage/bin/memUsage.py" \
PATH="$SOFT/memusage/bin:$PATH"
# cmake 3.14.5
RUN cd $SOFT \
&& wget -q "https://cmake.org/files/v3.14/cmake-3.14.5-Linux-x86_64.sh" -O "$SOFT/cmake-3.14.5-Linux-x86_64.sh" \
&& sh $SOFT/cmake-3.14.5-Linux-x86_64.sh --prefix="$SOFT" --include-subdir --skip-license
ENV PATH="$SOFT/cmake-3.14.5-Linux-x86_64/bin:$PATH"
# java8
RUN add-apt-repository -y ppa:openjdk-r/ppa \
&& apt-get update \
&& apt-get --yes --force-yes --no-install-recommends install \
openjdk-8-jre
ENV _JAVA_OPTIONS="-Djava.io.tmpdir=$TMPDIR"
# FastQC v0.11.8
RUN cd "$SOFT" \
&& wget -q "http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zip" -O "$SOFT/fastqc_v0.11.8.zip" \
&& unzip -q "$SOFT/fastqc_v0.11.8.zip" \
&& mv "$SOFT/FastQC" "$SOFT/FastQC_v0.11.8" \
&& chmod +x "$SOFT/FastQC_v0.11.8/fastqc" \
&& rm "$SOFT/fastqc_v0.11.8.zip"
ENV FASTQC="$SOFT/FastQC_v0.11.8/fastqc"
# samtools 1.9
RUN cd "$SOFT" \
&& wget -q "https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2" -O "$SOFT/samtools-1.9.tar.bz2" \
&& tar xjf "$SOFT/samtools-1.9.tar.bz2" \
&& mv "$SOFT/samtools-1.9" "$SOFT/samtools-1.9-src" \
&& cd "$SOFT/samtools-1.9-src" \
&& make -j"$(($(nproc)+1))" prefix="$SOFT/samtools-1.9" install \
&& cd "$SOFT/samtools-1.9-src/htslib-1.9" \
&& make -j"$(($(nproc)+1))" prefix="$SOFT/htslib-1.9" install \
&& cd "$SOFT" \
&& rm -r "$SOFT/samtools-1.9-src" \
&& rm "$SOFT/samtools-1.9.tar.bz2"
ENV SAMTOOLS="$SOFT/samtools-1.9/bin/samtools" \
BGZIP="$SOFT/htslib-1.9/bin/bgzip" \
TABIX="$SOFT/htslib-1.9/bin/tabix" \
PATH="$SOFT/samtools-1.9/bin:$SOFT/htslib-1.9/bin:$PATH" \
LD_LIBRARY_PATH="$SOFT/htslib-1.9/lib:$LD_LIBRARY_PATH"
# picard 2.19.2
# TODO: remove '-Dpicard.useLegacyParser=false' option from all picard commands after full transition to new syntax: https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
RUN cd "$SOFT" \
&& wget -q "https://github.com/broadinstitute/picard/releases/download/2.19.2/picard.jar" -O "$SOFT/picard-2.19.2.jar"
ENV PICARD="$SOFT/picard-2.19.2.jar"
# htslib-1.9 SOURCES + COMPILED
RUN cd "$SOFT" && \
wget -q "https://github.com/samtools/htslib/archive/1.9.zip" -O "$SOFT/htslib-1.9.zip" && \
unzip -q "$SOFT/htslib-1.9.zip" && \
mv "$SOFT/htslib-1.9" "$SOFT/htslib-1.9-src" && \
cd "$SOFT/htslib-1.9-src" && \
make -j"$(($(nproc)+1))" && \
rm "$SOFT/htslib-1.9.zip"
ENV HTSLIB_DIR="$SOFT/htslib-1.9-src"
# bioperl-live
# ensembl-vep/travisci/get_dependencies.sh: 1/4. BioPerl-live. Only keep the bioperl-live "Bio" library, that is temporary needed for faster installation of Bio-DB-HTS and cpanm packages.
WORKDIR $SOFT
RUN git clone --branch release-1-6-924 --depth 1 https://github.com/bioperl/bioperl-live.git && \
mv bioperl-live bioperl-live_bak && mkdir bioperl-live && mv bioperl-live_bak/Bio bioperl-live/ && rm -rf bioperl-live_bak
# ATTENTION!!! Here should be 'ENV PERL5LIB="$PERL5LIB:$SOFT/bioperl-live"' command, but if it is here we are not able to remove this path from it later!
# That's why using export in every RUN later...
ENV PERL5LIB $PERL5LIB:$SOFT/ensembl-vep:$SOFT/ensembl-vep/modules:$SOFT/bioperl-live
# bioperl-ext
WORKDIR $SOFT
RUN git clone "https://github.com/bioperl/bioperl-ext.git" && \
cd bioperl-ext && \
git checkout -b branch180924 73138e9f26b9cb6321288bff4fe2516e862aa975 && \
cd Bio/Ext/Align && \
# Update bioperl-ext Makefile.PL with '-fPIC'
perl -pi -e"s|(cd libs.+)CFLAGS=\\\'|\$1CFLAGS=\\\'-fPIC |" Makefile.PL && \
# Installing a folder 'lib/perl/5.18.2/auto/Bio/Ext/' with files 'Align/Align.so' & 'Align.pm' into PREFIX
perl Makefile.PL PREFIX="$SOFT/bioperl-ext_Bio-Ext-Align_180924" && \
make -j"$(($(nproc)+1))" && \
make -j"$(($(nproc)+1))" install && \
rm -r "$SOFT/bioperl-ext"
ENV PERL5LIB="$PERL5LIB:$SOFT/bioperl-ext_Bio-Ext-Align_180924/lib/perl/5.18.2"
# ensembl-xs, faster run using re-implementation in C of some of the Perl subroutines
RUN cd "$SOFT" && \
export PERL5LIB="$PERL5LIB:$SOFT/bioperl-live" && \
wget -q "https://github.com/Ensembl/ensembl-xs/archive/2.3.2.zip" -O "$SOFT/ensembl-xs-2.3.2.zip" && \
unzip -q "$SOFT/ensembl-xs-2.3.2.zip" && \
mv "$SOFT/ensembl-xs-2.3.2" "$SOFT/ensembl-xs-2.3.2-src" && \
cd "$SOFT/ensembl-xs-2.3.2-src" && \
perl Makefile.PL PREFIX="$SOFT/ensembl-xs-2.3.2" && \
make -j"$(($(nproc)+1))" && \
make -j"$(($(nproc)+1))" install && \
cd $SOFT && \
rm -r "$SOFT/ensembl-xs-2.3.2-src" && \
rm "$SOFT/ensembl-xs-2.3.2.zip"
ENV PERL5LIB="$PERL5LIB:$SOFT/ensembl-xs-2.3.2/lib/perl/5.18.2"
# Bio::DB::HTS
# ensembl-vep/travisci/get_dependencies.sh: 3/4. Bio::DB::HTS
# ensembl-vep/travisci/build_c.sh (Install/compile more libraries): 2/3. Bio::DB::HTS
WORKDIR $SOFT
RUN wget -q "https://github.com/Ensembl/Bio-DB-HTS/archive/3.01.zip" -O "$SOFT/Bio-DB-HTS-3.01.zip" && \
unzip -q "$SOFT/Bio-DB-HTS-3.01.zip" && \
mv "$SOFT/Bio-DB-HTS-3.01" "$SOFT/Bio-DB-HTS-3.01-src" && \
cd "$SOFT/Bio-DB-HTS-3.01-src" && \
# By default (without --prefix) installs '/usr/local/lib/perl/5.18.2/Bio/DB/HTS.pm'
# 'HTSLIB_DIR' env variable is used
perl Build.PL --prefix "$SOFT/Bio-DB-HTS-3.01" && \
./Build && \
#./Build test
./Build install && \
rm -r "$SOFT/Bio-DB-HTS-3.01-src" && \
rm "$SOFT/Bio-DB-HTS-3.01.zip"
# https://www.ensembl.org/info/docs/api/api_installation.html
ENV PERL5LIB="$PERL5LIB:$SOFT/Bio-DB-HTS-3.01/lib/perl/5.18.2"
# kent-335_base (jksrc)
# ensembl-vep/travisci/get_dependencies.sh: 4/4. jksrc
# ensembl-vep/travisci/build_c.sh (Install/compile more libraries): 3/3. kent src & Build
ENV KENT_SRC $SOFT/kent-335_base/src
ENV MACHTYPE x86_64
RUN cd "$SOFT" && \
wget -q "https://github.com/ucscGenomeBrowser/kent/archive/v335_base.zip" -O "$SOFT/kent-335_base.zip" && \
unzip -q "$SOFT/kent-335_base.zip" && \
# Only keep needed kent-335_base libraries for VEP (Serge: src/hg is about 300 MB)
rm -r "$SOFT/kent-335_base/java" "$SOFT/kent-335_base/python" && \
rm -r "$SOFT/kent-335_base/src/hg" && \
# mv kent-335_base kent-335_base_bak && mkdir -p kent-335_base/src && \
# cp -R kent-335_base_bak/confs kent-335_base/ && \
# cp -R kent-335_base_bak/src/lib kent-335_base_bak/src/inc kent-335_base_bak/src/jkOwnLib kent-335_base/src/ && \
# cp kent-335_base_bak/src/*.sh kent-335_base/src/ && \
# rm -rf kent-335_base_bak
#MACHTYPE="$(uname -m)" && \
MYSQLINC="$(mysql_config --include | sed -e 's/^-I//g')" && \
MYSQLLIBS="$(mysql_config --libs)" && \
#export MACHTYPE && \
export MYSQLINC && \
export MYSQLLIBS && \
export CFLAGS="-fPIC" && \
echo "Making kent [1/2] ..." && \
cd "$KENT_SRC/lib" && \
echo 'CFLAGS="-fPIC"' > ../inc/localEnvironment.mk && \
make clean && \
make -j"$(($(nproc)+1))" && \
echo "Making kent [2/2] ..." && \
cd ../jkOwnLib && \
make clean && \
make -j"$(($(nproc)+1))" && \
ln -s $KENT_SRC/lib/x86_64/* $KENT_SRC/lib/ && \
rm "$SOFT/kent-335_base.zip"
# TODO: Add anything that will use kent!
# ensembl-vep Perl API + required external Perl libraries
WORKDIR $SOFT
# Clone ensembl-vep git repository
RUN git clone --branch $BRANCH --depth 1 https://github.com/Ensembl/ensembl-vep.git && \
# Get ensemb cpanfile
wget -q "https://raw.githubusercontent.com/Ensembl/ensembl/$BRANCH/cpanfile" -O "$SOFT/ensembl_cpanfile" && \
echo "Installing cpanm packages [1/3]: ensembl perl dependencies ..." && \
cpanm --installdeps --with-recommends --notest --cpanfile ensembl_cpanfile . && \
echo "Installing cpanm packages [2/3]: ensembl-vep perl dependencies ..." && \
cpanm --installdeps --with-recommends --notest --cpanfile ensembl-vep/cpanfile .
echo "Installing cpanm packages [3/3] ..." && \
cpanm --installdeps --with-recommends --notest Archive::Zip && \
# Delete bioperl after the cpanm installs as it will be reinstalled by the INSTALL.pl script
rm -r "$SOFT/bioperl-live" && \
# Removing bioperl-like from PERL5LIB
PERL5LIB="$(echo "$PERL5LIB" | awk -v RS=: -v ORS=: '/\/bioperl-live/ {next} {print}')" && \
# Run INSTALL.pl and remove the ensemb-vep tests and travis
cd "$SOFT/ensembl-vep" && \
perl INSTALL.pl -a ap -l -g ProteinSeqs,Downstream,Conservation,GO && \
# pVACtools Wildtype plugin, unchanged since 02.11.2017 till 19.06.2019: https://github.com/griffithlab/pVACtools/commit/f6099b390363e3b0bd0f93be9a8380b9139009dc
wget -q "https://raw.githubusercontent.com/griffithlab/pVACtools/475f8cb91403a4819cda68a330337bbc463d2e4c/tools/pvacseq/VEP_plugins/Wildtype.pm" -O "$HOME/.vep/Plugins/Wildtype.pm"
ENV VEP="$SOFT/ensembl-vep/vep" \
VEPFILTER="$SOFT/ensembl-vep/filter_vep" \
VEPINSTALL="$SOFT/ensembl-vep/INSTALL.pl" \
CONVERTCACHE="$SOFT/ensembl-vep/convert_cache.pl --bgzip $BGZIP --tabix $TABIX" \
PERL5LIB="$PERL5LIB:$SOFT/cpanm/lib/perl5:$SOFT/cpanm/lib/perl5/x86_64-linux-gnu-thread-multi" \
PATH="$SOFT/ensembl-vep:$PATH"
COPY common_funcs.sh /usr/local/bin/
COPY vep.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/vep.sh
WORKDIR /outputs
ENTRYPOINT ["vep.sh"]
Best regards, Laurent
Dear @ens-lgil, thank you! But why do you use:
--NO_BIOPERL
option during installation with INSTALL.pl
?git clone --depth 1
instead of wget
of release archive, particularly in git clone --branch master --depth 1 https://github.com/Ensembl/Bio-DB-HTS.git
?About HTSLIB we have found this solution:
# htslib-1.9 SOURCES + COMPILED
# ensembl-vep/travisci/get_dependencies.sh: 2/4.
RUN cd "$SOFT" && \
mkdir -p "$SOFT/2" && \
cd "$SOFT/2" && \
wget -q "https://github.com/samtools/htslib/archive/1.9.zip" -O "$SOFT/2/htslib-1.9.zip" && \
unzip -q "$SOFT/2/htslib-1.9.zip" && \
mv "$SOFT/2/htslib-1.9" "$SOFT/htslib-1.9-src" && \
cd "$SOFT/htslib-1.9-src" && \
# Next line needed for correct compilation libhts.a for Bio-Bigfile. Code from: https://github.com/GMOD/GBrowse-Adaptors/blob/master/Bio-BigFile/README#L17
perl -pi -e 'if($_ =~ m/^CFLAGS/ && $_ !~ m/\-fPIC/i){chomp; s/#.+//; $_ .= " -fPIC -Wno-unused -Wno-unused-result\n"};' "$SOFT/htslib-1.9-src/Makefile" && \
make -j"$(($(nproc)+1))" && \
rm "$SOFT/2/htslib-1.9.zip"
ENV HTSLIB_DIR="$SOFT/htslib-1.9-src"
About kent
we have this combined solution for the installation of a newer version (I suppose, that newer is often better):
# ensembl-vep/travisci/get_dependencies.sh: 4/4. jksrc
# ensembl-vep/travisci/build_c.sh (Install/compile more libraries): 3/3. kent src & Build
ENV KENT_VERSION="383"
ENV KENT_SRC="$SOFT/kent-${KENT_VERSION}_base/src" \
MACHTYPE="x86_64"
RUN cd "$SOFT" && \
wget -q "https://github.com/ucscGenomeBrowser/kent/archive/v${KENT_VERSION}_base.zip" -O "$SOFT/kent-${KENT_VERSION}_base.zip" && \
unzip -q "$SOFT/kent-${KENT_VERSION}_base.zip" && \
# Only keep needed kent-${KENT_VERSION}_base libraries for VEP (Serge: src/hg is about 300 MB)
rm -r "$SOFT/kent-${KENT_VERSION}_base/java" "$SOFT/kent-${KENT_VERSION}_base/python" && \
rm -r "$SOFT/kent-${KENT_VERSION}_base/src/hg" && \
MYSQLINC="$(mysql_config --include | sed -e 's/^-I//g')" && \
MYSQLLIBS="$(mysql_config --libs)" && \
export MYSQLINC && \
export MYSQLLIBS && \
export CFLAGS="-fPIC" && \
echo "Making kent [1/2] ..." && \
cd "$KENT_SRC/lib" && \
echo 'CFLAGS="-fPIC"' > "$KENT_SRC/inc/localEnvironment.mk" && \
make clean && \
make -j"$(($(nproc)+1))" && \
echo "Making kent [2/2] ..." && \
cd "$KENT_SRC/jkOwnLib" && \
make clean && \
make -j"$(($(nproc)+1))" && \
ln -s $KENT_SRC/lib/x86_64/* "$KENT_SRC/lib/" && \
rm "$SOFT/kent-${KENT_VERSION}_base.zip"
About Bigfile
we suppose this:
# VEP 96.3 (+ Wildtype.pm plugin from pVAC-Seq) (including cpanm dependancies)
RUN cd "$SOFT" && \
# Clone ensembl-vep git repository
git clone --branch $BRANCH --depth 1 https://github.com/Ensembl/ensembl-vep.git && \
mv "$SOFT/ensembl-vep" "$SOFT/ensembl-vep-96.3" && \
# Get ensemb cpanfile
wget -q "https://raw.githubusercontent.com/Ensembl/ensembl/$BRANCH/cpanfile" -O "$SOFT/ensembl_cpanfile" && \
echo "Installing cpanm packages [1/5]: ensembl perl dependencies ..." && \
$SOFT/cpanm/bin/cpanm --local-lib $SOFT/cpanm --installdeps --with-recommends --notest --cpanfile ensembl_cpanfile . && \
export PERL5LIB="$PERL5LIB:$SOFT/cpanm/lib/perl5" && \
echo "Installing cpanm packages [2/5]: Bundle::LWP as Bio::DB::BigFile dependancy ..." && \
$SOFT/cpanm/bin/cpanm --local-lib $SOFT/cpanm --with-recommends --notest Bundle::LWP && \
echo "Installing cpanm packages [3/5]: Bio::DB::BigFile v1.07 ..." && \
cd "$SOFT" && \
wget -q "https://cpan.metacpan.org/authors/id/L/LD/LDS/Bio-BigFile-1.07.tar.gz" -O "$SOFT/Bio-BigFile-1.07.tar.gz" && \
tar -xzf "$SOFT/Bio-BigFile-1.07.tar.gz" && \
mv "$SOFT/Bio-BigFile-1.07" "$SOFT/Bio-BigFile-1.07-src" && \
cd "$SOFT/Bio-BigFile-1.07-src" && \
# 'KENT_SRC', 'HTSLIB_DIR' & 'MACHTYPE' should be defined
# Solution for new kent versions (newer than 335) from: https://github.com/GMOD/GBrowse-Adaptors/pull/19
sed -i "/^my \$LibFile = \"jkweb.a\";.*/a my \$HeaderFileHTS = \"tbx.h\";\nmy \$LibFileHTS = \"libhts.a\";" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)\\\$jk_include,\\\$jk_lib(.*)$|\1\$jk_include,\$jk_lib,\$hts_include,\$hts_lib\2|g" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)\[\\\$jk_include(.*)$|\1\[\$jk_include,\$hts_include\2|g" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)\"\\\$jk_lib/\\\$LibFile\",(.*)$|\1\"\$jk_lib/\$LibFile\",\"\$hts_lib/\$LibFileHTS\",\2|" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)'-lz','-lssl'(.*)$|\1'-lz','-lssl','-pthread'\2|g" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -i "/^.*if -e \"\\\$jksrc\/lib\/\\\$ENV.*/a \
\$hts_include = \"\$ENV{HTSLIB_DIR}\" if -e \"\$ENV{HTSLIB_DIR}\/htslib\/\$HeaderFileHTS\";\n \
\$hts_lib = \"\$ENV{HTSLIB_DIR}\" if -e \"\$ENV{HTSLIB_DIR}\/\$LibFileHTS\";" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
perl Build.PL --prefix "$SOFT/Bio-BigFile-1.07" && \
./Build && \
./Build install && \
rm -r "$SOFT/Bio-BigFile-1.07-src" && \
rm "$SOFT/Bio-BigFile-1.07.tar.gz" && \
export PERL5LIB="$PERL5LIB:$SOFT/Bio-BigFile-1.07/lib/perl/5.18.2" && \
cd "$SOFT" && \
echo "Installing cpanm packages [4/5]: ensembl-vep perl dependencies ..." && \
$SOFT/cpanm/bin/cpanm --local-lib $SOFT/cpanm --installdeps --with-recommends --notest --cpanfile ensembl-vep-96.3/cpanfile . && \
echo "Installing cpanm packages [5/5] ..." && \
# $SOFT/cpanm/bin/cpanm --notest --local-lib $SOFT/cpanm Archive::Zip Module::Build Bio::Perl DBI DBD::mysql Set::IntervalTree JSON PerlIO::gzip Scalar::Util Try::Tiny && \
$SOFT/cpanm/bin/cpanm --local-lib $SOFT/cpanm --with-recommends --notest Archive::Zip && \
# Run INSTALL.pl and remove the ensemb-vep tests and travis
export PERL5LIB="$PERL5LIB:$SOFT/cpanm/lib/perl5" && \
cd "$SOFT/ensembl-vep-96.3" && \
# --NO_TEST
perl INSTALL.pl --NO_UPDATE --NO_HTSLIB --NO_BIOPERL -a ap -s homo_sapiens -y GRCh37,GRCh38 -g ProteinSeqs,Downstream,Conservation,GO && \
# pVACtools Wildtype plugin, unchanged since 02.11.2017 till 19.06.2019: https://github.com/griffithlab/pVACtools/commit/f6099b390363e3b0bd0f93be9a8380b9139009dc
wget -q "https://raw.githubusercontent.com/griffithlab/pVACtools/475f8cb91403a4819cda68a330337bbc463d2e4c/tools/pvacseq/VEP_plugins/Wildtype.pm" -O "$HOME/.vep/Plugins/Wildtype.pm"
ENV VEP="$SOFT/ensembl-vep-96.3/vep" \
VEPFILTER="$SOFT/ensembl-vep-96.3/filter_vep" \
VEPINSTALL="$SOFT/ensembl-vep-96.3/INSTALL.pl" \
CONVERTCACHE="$SOFT/ensembl-vep-96.3/convert_cache.pl --bgzip $BGZIP --tabix $TABIX" \
PERL5LIB="$PERL5LIB:$SOFT/cpanm/lib/perl5:$SOFT/Bio-BigFile-1.07/lib/perl/5.18.2:$SOFT/ensembl-vep-96.3:$SOFT/ensembl-vep-96.3/modules" \
PATH="$SOFT/ensembl-vep-96.3:$PATH"
What is the aim of installing ensembl-variation C code
and ensembl-io
before VEP installation? Both ensembl-variation
and ensembl-io
are downloaded during VEP installation...
# ensembl-variation C code: Compile Variation LD C scripts
RUN cd "$SOFT" && \
export PERL5LIB="$PERL5LIB:$SOFT/bioperl-live" && \
git clone --branch $BRANCH --depth 1 https://github.com/Ensembl/ensembl-variation.git && \
cd ensembl-variation/C_code && \
# 'HTSLIB_DIR' env variable shoud be defined and should point to a directory with htslib sources + compiled, where an 'htslib` subdir with *.h files is and where hts.* compiled files are.
make -j"$(($(nproc)+1))" && \
# Copy 2 binaries to ../bin (this path is hardcoded into makefile)
make install
ENV PATH="$SOFT/ensembl-variation/bin:$PATH" \
PERL5LIB="$PERL5LIB:$SOFT/ensembl-variation/modules"
# ensembl-io
RUN cd "$SOFT" && \
git clone --branch $BRANCH --depth 1 "https://github.com/Ensembl/ensembl-io.git"
ENV PERL5LIB="$PERL5LIB:$SOFT/ensembl-io/modules"
I don't see, where Archive::Zip
is installed?
$SOFT/cpanm/bin/cpanm --local-lib $SOFT/cpanm --with-recommends --notest Archive::Zip
Dear @ens-lgil, thank you!
use the VEP installer to install
ensembl-io
andensembl-variation
Ok! Only compile and add to PATH
ensembl-variation
's C_code
.
use apt-get to install the
cpanminus
package
Not possible in ubuntu:14.04
, that we currently use in all our images.
use the cpanfile files to install
Bio::DB::BigFile
We use separate installation to fix the installation of it with new kent
.
We recommend to install bioperl-live version 1.6.924
Why? Why not new one? And why not to use it instead of installing by VEP's INSTALL.pl
(with --NO_BIOPERL
option).
As you mentionned in the issue #513 ,
Bio::DB::BigFile
requires kent version 335
I hope that the newer is the better!
I don't know if this is a requirement for the others libraries/tool you want to include in your Docker image but I would recommend to use ubuntu:16.04 or even ubuntu:18.04 (slightly faster) as base image.
ubuntu:14.04
is our common version in all images for now. We are going to update it later...(
Finally, bear in mind that adding the caches (GRCh37 and GRCh38) within your Docker image is going to make it huge (>40Gb: each cache is over 10Gb and each fasta is about 8Gb). We recommend to install the caches outside the Docker container and then mount the directory to the container: https://www.ensembl.org/info/docs/tools/vep/script/vep_download.html#docker.
The options -s homo_sapiens -y GRCh37,GRCh38
in the INSTALL.pl
command do not download cache and reference. I'll check and remove them, thank you! We do not store cache inside the docker image!
Here is our next version:
FROM ubuntu:14.04
LABEL maintainer="serge2016"
# Author: Serge I. Mitrofanov.
# LastUpdate: 27.06.2019 20:10.
# Tool type: mutation annotator
# Contents:
# VEP 96.3 - Apache License, Version 2.0 + Wildtype plugin from pVACtools.
# ensembl-xs v2.3.2 - Apache License, Version 2.0.
# samtools - MIT/Expat License.
# picard - MIT License.
# Input:
# VCF file
# Output:
# VEP-annotated input VCF-file
ENV DEBIAN_FRONTEND="noninteractive"
RUN apt-get update && apt-get --yes --force-yes --no-install-recommends install \
build-essential \
pkg-config \
software-properties-common \
ncurses-dev \
curl \
wget \
nano \
time \
tcsh \
gawk \
bzip2 \
pigz \
zip \
unzip \
xz-utils \
mc \
htop \
iotop \
git-core \
subversion \
python \
python-tk \
python-dev \
python-setuptools \
openssh-client \
openssl \
libssl-dev \
libyaml-dev \
zlib1g-dev \
libbz2-dev \
liblzma-dev \
libffi-dev \
libxml2-dev \
libxslt1-dev \
libpq-dev \
realpath
ENV TZ="Europe/Moscow"
RUN echo $TZ > /etc/timezone \
&& dpkg-reconfigure tzdata
ENV TMPDIR="/tmp"
RUN mkdir -p "$TMPDIR"
ENV SOFT="/soft"
RUN mkdir -p "$SOFT"
# memUsage (both python 2 & 3) (Olga)
# psutil >= 2.2.1 (Tested with 5.6.1 - ok; 1.2.1 - err) - additional python package required for memUsage. That's why apt install python-psutil doesn't fit on Ubuntu 14.04
RUN cd "$SOFT" \
&& git clone https://github.com/giampaolo/psutil.git \
&& cd "$SOFT/psutil" \
&& python setup.py install \
&& cd "$SOFT" \
&& rm -r "$SOFT/psutil" \
&& mkdir -p "$SOFT/memusage/bin" \
&& wget -q "https://raw.githubusercontent.com/ozolotareva/housekeeping-scr/master/memUsage.py" -O - | tr -d '\r' > "$SOFT/memusage/bin/memUsage.py" \
&& chmod +x "$SOFT/memusage/bin/memUsage.py"
ENV MEMUSAGE="$SOFT/memusage/bin/memUsage.py" \
PATH="$SOFT/memusage/bin:$PATH"
# cmake 3.14.5
RUN cd $SOFT \
&& wget -q "https://cmake.org/files/v3.14/cmake-3.14.5-Linux-x86_64.sh" -O "$SOFT/cmake-3.14.5-Linux-x86_64.sh" \
&& sh "$SOFT/cmake-3.14.5-Linux-x86_64.sh" --prefix="$SOFT" --include-subdir --skip-license \
&& rm "$SOFT/cmake-3.14.5-Linux-x86_64.sh"
ENV PATH="$SOFT/cmake-3.14.5-Linux-x86_64/bin:$PATH"
# java8
RUN add-apt-repository -y ppa:openjdk-r/ppa \
&& apt-get update \
&& apt-get --yes --force-yes --no-install-recommends install \
openjdk-8-jre
ENV _JAVA_OPTIONS="-Djava.io.tmpdir=$TMPDIR"
# FastQC v0.11.8
RUN cd "$SOFT" \
&& wget -q "http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zip" -O "$SOFT/fastqc_v0.11.8.zip" \
&& unzip -q "$SOFT/fastqc_v0.11.8.zip" \
&& mv "$SOFT/FastQC" "$SOFT/FastQC_v0.11.8" \
&& chmod +x "$SOFT/FastQC_v0.11.8/fastqc" \
&& rm "$SOFT/fastqc_v0.11.8.zip"
ENV FASTQC="$SOFT/FastQC_v0.11.8/fastqc"
# samtools 1.9
RUN cd "$SOFT" \
&& wget -q "https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2" -O "$SOFT/samtools-1.9.tar.bz2" \
&& tar -xjf "$SOFT/samtools-1.9.tar.bz2" \
&& mv "$SOFT/samtools-1.9" "$SOFT/samtools-1.9-src" \
&& cd "$SOFT/samtools-1.9-src/htslib-1.9" \
&& ./configure --prefix="$SOFT/htslib-1.9" \
&& make -j"$(($(nproc)+1))" \
&& make install \
&& cd "$SOFT/samtools-1.9-src" \
&& ./configure --prefix="$SOFT/samtools-1.9" --with-htslib="$SOFT/htslib-1.9" \
&& make -j"$(($(nproc)+1))" \
&& make install \
&& cd "$SOFT" \
&& rm -r "$SOFT/samtools-1.9-src" \
&& rm "$SOFT/samtools-1.9.tar.bz2"
ENV SAMTOOLS="$SOFT/samtools-1.9/bin/samtools" \
BGZIP="$SOFT/htslib-1.9/bin/bgzip" \
TABIX="$SOFT/htslib-1.9/bin/tabix" \
PATH="$SOFT/samtools-1.9/bin:$SOFT/htslib-1.9/bin:$PATH" \
LD_LIBRARY_PATH="$SOFT/htslib-1.9/lib:$LD_LIBRARY_PATH"
# picard 2.20.2
# TODO: remove '-Dpicard.useLegacyParser=false' option from all picard commands after full transition to new syntax: https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
RUN cd "$SOFT" \
&& wget -q "https://github.com/broadinstitute/picard/releases/download/2.20.2/picard.jar" -O "$SOFT/picard-2.20.2.jar"
ENV PICARD="$SOFT/picard-2.20.2.jar"
# perl & mysql
RUN apt-get --yes --force-yes --no-install-recommends install \
libdbd-mysql-perl \
libmysqlclient-dev \
libpng-dev \
uuid-dev
RUN apt-get clean \
&& rm -rf /var/lib/apt/lists
# cpanm 1.7044
# https://metacpan.org/pod/App::cpanminus#INSTALLATION
RUN cd "$SOFT" \
&& mkdir -p "$SOFT/cpanm/bin" \
&& curl -sSL --insecure "https://cpanmin.us/" -o "$SOFT/cpanm/bin/cpanm" \
&& chmod +x "$SOFT/cpanm/bin/cpanm"
ENV CPANM="$SOFT/cpanm/bin/cpanm" \
PATH="$SOFT/cpanm/bin:$PATH"
ENV VEP_CACHE_VERSION="96"
ENV BRANCH="release/${VEP_CACHE_VERSION}"
# bioperl-live 1-7-2
# ensembl-vep/travisci/get_dependencies.sh: 1/4. BioPerl-live.
RUN cd "$SOFT" && \
wget -q "https://github.com/bioperl/bioperl-live/archive/release-1-7-2.zip" -O "$SOFT/bioperl-live-release-1-7-2.zip" && \
unzip -q "$SOFT/bioperl-live-release-1-7-2.zip" && \
rm "$SOFT/bioperl-live-release-1-7-2.zip"
ENV PERL5LIB="$PERL5LIB:$SOFT/bioperl-live-release-1-7-2"
# htslib-1.9 SOURCES + COMPILED
# ensembl-vep/travisci/get_dependencies.sh: 2/4.
RUN cd "$SOFT" && \
mkdir -p "$SOFT/2" && \
cd "$SOFT/2" && \
wget -q "https://github.com/samtools/htslib/archive/1.9.zip" -O "$SOFT/2/htslib-1.9.zip" && \
unzip -q "$SOFT/2/htslib-1.9.zip" && \
mv "$SOFT/2/htslib-1.9" "$SOFT/htslib-1.9-src" && \
cd "$SOFT/htslib-1.9-src" && \
# Next line needed for correct compilation libhts.a for Bio-Bigfile. Code from: https://github.com/GMOD/GBrowse-Adaptors/blob/master/Bio-BigFile/README#L17
perl -pi -e 'if($_ =~ m/^CFLAGS/ && $_ !~ m/\-fPIC/i){chomp; s/#.+//; $_ .= " -fPIC -Wno-unused -Wno-unused-result\n"};' "$SOFT/htslib-1.9-src/Makefile" && \
make -j"$(($(nproc)+1))" && \
rm "$SOFT/2/htslib-1.9.zip"
ENV HTSLIB_DIR="$SOFT/htslib-1.9-src"
# Bio::DB::HTS
# ensembl-vep/travisci/get_dependencies.sh: 3/4. Bio::DB::HTS
# ensembl-vep/travisci/build_c.sh (Install/compile more libraries): 2/3. Bio::DB::HTS
RUN cd "$SOFT" && \
wget -q "https://github.com/Ensembl/Bio-DB-HTS/archive/3.01.zip" -O "$SOFT/Bio-DB-HTS-3.01.zip" && \
unzip -q "$SOFT/Bio-DB-HTS-3.01.zip" && \
mv "$SOFT/Bio-DB-HTS-3.01" "$SOFT/Bio-DB-HTS-3.01-src" && \
cd "$SOFT/Bio-DB-HTS-3.01-src" && \
# By default (without --prefix) installs '/usr/local/lib/perl/5.18.2/Bio/DB/HTS.pm'
# 'HTSLIB_DIR' env variable is used
perl Build.PL --prefix "$SOFT/Bio-DB-HTS-3.01" && \
./Build && \
#./Build test
./Build install && \
cd "$SOFT" && \
rm -r "$SOFT/Bio-DB-HTS-3.01-src" && \
rm "$SOFT/Bio-DB-HTS-3.01.zip"
# https://www.ensembl.org/info/docs/api/api_installation.html
ENV PERL5LIB="$PERL5LIB:$SOFT/Bio-DB-HTS-3.01/lib/perl/5.18.2"
# ensembl-variation C code: Compile Variation LD C scripts
RUN cd "$SOFT" && \
git clone --branch $BRANCH --depth 1 https://github.com/Ensembl/ensembl-variation.git && \
mkdir -p "$SOFT/ensembl-variation_C_code/src" && \
mv "$SOFT/ensembl-variation/C_code/"* "$SOFT/ensembl-variation_C_code/src/" && \
cd "$SOFT/ensembl-variation_C_code/src" && \
# 'HTSLIB_DIR' env variable shoud be defined and should point to a directory with htslib sources + compiled, where an 'htslib` subdir with *.h files is and where hts.* compiled files are.
make -j"$(($(nproc)+1))" && \
# Copy 2 binaries to ../bin (this path is hardcoded into makefile)
make install && \
rm -r "$SOFT/ensembl-variation"
ENV PATH="$SOFT/ensembl-variation_C_code/bin:$PATH"
# bioperl-ext, faster alignments for haplo (XS-based BioPerl extensions to C libraries) - used by Haplosaurus
RUN cd "$SOFT" && \
git clone "https://github.com/bioperl/bioperl-ext.git" && \
cd bioperl-ext && \
git checkout -b branch180924 73138e9f26b9cb6321288bff4fe2516e862aa975 && \
cd Bio/Ext/Align && \
# Update bioperl-ext Makefile.PL with '-fPIC'
perl -pi -e"s|(cd libs.+)CFLAGS=\\\'|\$1CFLAGS=\\\'-fPIC |" Makefile.PL && \
# Installing a folder 'lib/perl/5.18.2/auto/Bio/Ext/' with files 'Align/Align.so' & 'Align.pm' into PREFIX
perl Makefile.PL PREFIX="$SOFT/bioperl-ext_Bio-Ext-Align_180924" && \
make -j"$(($(nproc)+1))" && \
make install && \
cd "$SOFT" && \
rm -r "$SOFT/bioperl-ext"
ENV PERL5LIB="$PERL5LIB:$SOFT/bioperl-ext_Bio-Ext-Align_180924/lib/perl/5.18.2"
# ensembl-xs, faster run using re-implementation in C of some of the Perl subroutines - it contains compiled versions of certain key subroutines used in VEP
ENV ENSEMBL_XS_VERSION="2.3.2"
RUN cd "$SOFT" && \
wget -q "https://github.com/Ensembl/ensembl-xs/archive/${ENSEMBL_XS_VERSION}.zip" -O "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}.zip" && \
unzip -q "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}.zip" && \
mv "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}" "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}-src" && \
cd "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}-src" && \
perl Makefile.PL PREFIX="$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}" && \
make -j"$(($(nproc)+1))" && \
make -j"$(($(nproc)+1))" install && \
cd $SOFT && \
rm -r "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}-src" && \
rm "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}.zip"
ENV PERL5LIB="$PERL5LIB:$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}/lib/perl/5.18.2"
# kent_base (jksrc) - used by Bio::DB::BigFile (bigWig parsing)
# ensembl-vep/travisci/get_dependencies.sh: 4/4. jksrc
# ensembl-vep/travisci/build_c.sh (Install/compile more libraries): 3/3. kent src & Build
ENV KENT_VERSION="383"
ENV KENT_SRC="$SOFT/kent-${KENT_VERSION}_base/src" \
MACHTYPE="x86_64"
RUN cd "$SOFT" && \
wget -q "https://github.com/ucscGenomeBrowser/kent/archive/v${KENT_VERSION}_base.zip" -O "$SOFT/kent-${KENT_VERSION}_base.zip" && \
unzip -q "$SOFT/kent-${KENT_VERSION}_base.zip" && \
# Only keep needed kent-${KENT_VERSION}_base libraries for VEP (Serge: src/hg is about 300 MB)
rm -r "$SOFT/kent-${KENT_VERSION}_base/java" "$SOFT/kent-${KENT_VERSION}_base/python" && \
rm -r "$SOFT/kent-${KENT_VERSION}_base/src/hg" && \
MYSQLINC="$(mysql_config --include | sed -e 's/^-I//g')" && \
MYSQLLIBS="$(mysql_config --libs)" && \
export MYSQLINC && \
export MYSQLLIBS && \
export CFLAGS="-fPIC" && \
echo "Making kent [1/2] ..." && \
cd "$KENT_SRC/lib" && \
echo 'CFLAGS="-fPIC"' > "$KENT_SRC/inc/localEnvironment.mk" && \
make clean && \
make -j"$(($(nproc)+1))" && \
echo "Making kent [2/2] ..." && \
cd "$KENT_SRC/jkOwnLib" && \
make clean && \
make -j"$(($(nproc)+1))" && \
# ln -s $KENT_SRC/lib/x86_64/* "$KENT_SRC/lib/" && \
rm "$SOFT/kent-${KENT_VERSION}_base.zip"
# VEP 96.3 (+ Wildtype.pm plugin from pVAC-Seq) (including cpanm dependancies)
ENV PERL5LIB="$PERL5LIB:$SOFT/cpanm/lib/perl5"
RUN cd "$SOFT" && \
# Clone ensembl-vep git repository
git clone --branch $BRANCH --depth 1 https://github.com/Ensembl/ensembl-vep.git && \
mv "$SOFT/ensembl-vep" "$SOFT/ensembl-vep-96.3" && \
echo "Installing cpanm packages [1/5]: ensembl perl dependencies ..." && \
# Get ensemb cpanfile
wget -q "https://raw.githubusercontent.com/Ensembl/ensembl/$BRANCH/cpanfile" -O "$SOFT/ensembl_cpanfile" && \
$CPANM --local-lib $SOFT/cpanm --installdeps --with-recommends --notest --cpanfile "$SOFT/ensembl_cpanfile" . && \
rm "$SOFT/ensembl_cpanfile" && \
echo "Installing cpanm packages [2/5]: Bundle::LWP as Bio::DB::BigFile dependancy ..." && \
$CPANM --local-lib $SOFT/cpanm --with-recommends --notest Bundle::LWP && \
echo "Installing cpanm packages [3/5]: Bio::DB::BigFile v1.07 ..." && \
cd "$SOFT" && \
wget -q "https://cpan.metacpan.org/authors/id/L/LD/LDS/Bio-BigFile-1.07.tar.gz" -O "$SOFT/Bio-BigFile-1.07.tar.gz" && \
tar -xzf "$SOFT/Bio-BigFile-1.07.tar.gz" && \
mv "$SOFT/Bio-BigFile-1.07" "$SOFT/Bio-BigFile-1.07-src" && \
cd "$SOFT/Bio-BigFile-1.07-src" && \
# 'KENT_SRC', 'HTSLIB_DIR' & 'MACHTYPE' should be defined
# Solution for new kent versions (newer than 335) from: https://github.com/GMOD/GBrowse-Adaptors/pull/19
sed -i "/^my \$LibFile = \"jkweb.a\";.*/a my \$HeaderFileHTS = \"tbx.h\";\nmy \$LibFileHTS = \"libhts.a\";" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)\\\$jk_include,\\\$jk_lib(.*)$|\1\$jk_include,\$jk_lib,\$hts_include,\$hts_lib\2|g" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)\[\\\$jk_include(.*)$|\1\[\$jk_include,\$hts_include\2|g" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)\"\\\$jk_lib/\\\$LibFile\",(.*)$|\1\"\$jk_lib/\$LibFile\",\"\$hts_lib/\$LibFileHTS\",\2|" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)'-lz','-lssl'(.*)$|\1'-lz','-lssl','-pthread'\2|g" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -i "/^.*if -e \"\\\$jksrc\/lib\/\\\$ENV.*/a \
\$hts_include = \"\$ENV{HTSLIB_DIR}\" if -e \"\$ENV{HTSLIB_DIR}\/htslib\/\$HeaderFileHTS\";\n \
\$hts_lib = \"\$ENV{HTSLIB_DIR}\" if -e \"\$ENV{HTSLIB_DIR}\/\$LibFileHTS\";" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
perl Build.PL --prefix "$SOFT/Bio-BigFile-1.07" && \
./Build && \
./Build install && \
rm -r "$SOFT/Bio-BigFile-1.07-src" && \
rm "$SOFT/Bio-BigFile-1.07.tar.gz" && \
export PERL5LIB="$PERL5LIB:$SOFT/Bio-BigFile-1.07/lib/perl/5.18.2" && \
cd "$SOFT" && \
echo "Installing cpanm packages [4/5]: ensembl-vep perl dependencies ..." && \
$CPANM --local-lib $SOFT/cpanm --installdeps --with-recommends --notest --cpanfile ensembl-vep-96.3/cpanfile . && \
echo "Installing cpanm packages [5/5] ..." && \
# $CPANM --notest --local-lib $SOFT/cpanm Archive::Zip Module::Build Bio::Perl DBI DBD::mysql Set::IntervalTree JSON PerlIO::gzip Scalar::Util Try::Tiny
$CPANM --local-lib $SOFT/cpanm --with-recommends --notest Archive::Zip && \
cd "$SOFT/ensembl-vep-96.3" && \
perl INSTALL.pl --NO_TEST --NO_UPDATE --NO_HTSLIB --NO_BIOPERL -a ap -g ProteinSeqs,Downstream,Conservation,GO && \
# pVACtools Wildtype plugin, unchanged since 02.11.2017 till 19.06.2019: https://github.com/griffithlab/pVACtools/commit/f6099b390363e3b0bd0f93be9a8380b9139009dc
wget -q "https://raw.githubusercontent.com/griffithlab/pVACtools/475f8cb91403a4819cda68a330337bbc463d2e4c/tools/pvacseq/VEP_plugins/Wildtype.pm" -O "$HOME/.vep/Plugins/Wildtype.pm"
ENV VEP="$SOFT/ensembl-vep-96.3/vep" \
VEPFILTER="$SOFT/ensembl-vep-96.3/filter_vep" \
VEPINSTALL="$SOFT/ensembl-vep-96.3/INSTALL.pl" \
CONVERTCACHE="$SOFT/ensembl-vep-96.3/convert_cache.pl --bgzip $BGZIP --tabix $TABIX" \
PERL5LIB="$PERL5LIB:$SOFT/Bio-BigFile-1.07/lib/perl/5.18.2:$SOFT/ensembl-vep-96.3:$SOFT/ensembl-vep-96.3/modules" \
PATH="$SOFT/ensembl-vep-96.3:$SOFT/Bio-BigFile-1.07/bin:$PATH"
# Run VEP module tests ...
RUN cd "$SOFT/ensembl-vep-96.3" && \
perl -e "use Bio::DB::HTS::Tabix" && \
perl -e "use Bio::DB::BigFile" && \
perl -Imodules t/AnnotationSource_File_BigWig.t && \
prove t/*.t
COPY common_funcs.sh /usr/local/bin/
COPY vep.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/vep.sh
WORKDIR /outputs
ENTRYPOINT ["vep.sh"]
Now I have another question: we have htslib
s: the default one from basic samtools
install and from custom installation, needed for Bio::DB::HTS
, ensembl-variation C code
and Bio::DB::BigFile
.
The only difference is in -fPIC
added to the CFLAGS
variable (-Wno-unused -Wno-unused-result
is only for hiding warnings).
The -fpic
option already exist there (in EXTRA_CFLAGS_PIC
: https://github.com/samtools/htslib/blob/develop/Makefile#L37), but it is not obvious, is it used or not.
Dear @serge2016,
I will try to answer the differents questions:
This is mainly to keep the installation as close (and clean) as the normal VEP installation.
The extra ensembl-variation
C_code is only used as extra if you want to use the LD VEP_plugins.
The early bioperl-live
installation is mainly for the extra "speed up" libraries we install in the Docker image.
Fair enough.
I can see why you need to install Bio::DB::BigFile
manually in your case. See point 5.
As you might know, VEP is based on the Ensembl Perl API. The Ensembl Perl API is heavily using bioperl-live
version 1.6.924 (this has been thoroughly tested on the thousands of lines of the Ensembl Perl API + VEP):
https://www.ensembl.org/info/docs/api/api_installation.html#installation
We might upgrade the support to 1.7.2 in the future but this will need quite a lot of tests from every team withing Ensembl to make sure that the whole API is compatible and update the code to make it compatible.
I can see why in your case but, even if I agree with newer is often better. However we need Bio::DB::BigFile
and it requires kent
version 335. We don't need kent
for anything else in the VEP scope so don't really see the interest to edit the current Bio::DB::BigFile
latest version in order to use the latest version of kent
.
No problem, it makes sense.
Sorry, I overlooked the fact that you didn't use the c
parameter with the option -a
.
As a side answer, I personaly prefer to use the git clone --depth 1
instead of wget
simply because it makes the code more compact (1 line/command versus 3 (wget, unzip, rm)).
Dear @serge2016,
Regarding the variable EXTRA_CFLAGS_PIC
, I don't know how it's used during the compilation of htslib
within the samtool
package.
In VEP, we only use the variable CFLAGS
.
Best regards, Laurent
Dear @ens-lgil, thank you for a detailed answer!
What is "LD VEP_plugins"?
I understand now the point with bioperl-live
version 1.6.924. But why not to keep the full bioperl-live
package and not to install VEP
with --NO_BIOPERL
option?
Here is my new Dockerfile
(it is a bit cleaner; libdeflate
is added to htslib
):
FROM ubuntu:14.04
LABEL maintainer="serge2016"
# Author: Serge I. Mitrofanov.
# LastUpdate: 05.07.2019 22:10.
# Tool type: mutation annotator
# Contents:
# VEP - Apache License, Version 2.0 + Wildtype plugin from pVACtools.
# ensembl-xs - Apache License, Version 2.0.
# samtools - MIT/Expat License.
# picard - MIT License.
# Input:
# VCF file
# Output:
# VEP-annotated input VCF-file
ENV DEBIAN_FRONTEND="noninteractive"
RUN apt-get update && apt-get --yes --force-yes --no-install-recommends install \
build-essential \
pkg-config \
software-properties-common \
ncurses-dev \
curl \
wget \
nano \
time \
tcsh \
gawk \
bzip2 \
pigz \
zip \
unzip \
xz-utils \
mc \
htop \
iotop \
git-core \
subversion \
python \
python-tk \
python-dev \
python-setuptools \
openssh-client \
openssl \
libssl-dev \
libcurl4-openssl-dev \
libyaml-dev \
zlib1g-dev \
libbz2-dev \
liblzma-dev \
libffi-dev \
libxml2-dev \
libxslt1-dev \
libpq-dev \
realpath
ENV TZ="Europe/Moscow"
RUN echo $TZ > /etc/timezone \
&& dpkg-reconfigure tzdata
ENV TMPDIR="/tmp"
RUN mkdir -p "$TMPDIR"
ENV SOFT="/soft"
RUN mkdir -p "$SOFT"
# memUsage (both python 2 & 3) (Olga)
# psutil >= 2.2.1 (Tested with 5.6.1 - ok; 1.2.1 - err) - additional python package required for memUsage. That's why apt install python-psutil doesn't fit on Ubuntu 14.04
RUN cd "$SOFT" \
&& git clone https://github.com/giampaolo/psutil.git \
&& cd "$SOFT/psutil" \
&& python setup.py install \
&& cd "$SOFT" \
&& rm -r "$SOFT/psutil" \
&& mkdir -p "$SOFT/memusage/bin" \
&& wget -q "https://raw.githubusercontent.com/ozolotareva/housekeeping-scr/master/memUsage.py" -O - | tr -d '\r' > "$SOFT/memusage/bin/memUsage.py" \
&& chmod +x "$SOFT/memusage/bin/memUsage.py"
ENV MEMUSAGE="$SOFT/memusage/bin/memUsage.py" \
PATH="$SOFT/memusage/bin:$PATH"
# cmake 3.14.5
RUN cd $SOFT \
&& wget -q "https://cmake.org/files/v3.14/cmake-3.14.5-Linux-x86_64.sh" -O "$SOFT/cmake-3.14.5-Linux-x86_64.sh" \
&& sh "$SOFT/cmake-3.14.5-Linux-x86_64.sh" --prefix="$SOFT" --include-subdir --skip-license \
&& rm "$SOFT/cmake-3.14.5-Linux-x86_64.sh"
ENV PATH="$SOFT/cmake-3.14.5-Linux-x86_64/bin:$PATH"
# java8
RUN add-apt-repository -y ppa:openjdk-r/ppa \
&& apt-get update \
&& apt-get --yes --force-yes --no-install-recommends install \
openjdk-8-jre
ENV _JAVA_OPTIONS="-Djava.io.tmpdir=$TMPDIR"
# FastQC v0.11.8
RUN cd "$SOFT" \
&& wget -q "http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zip" -O "$SOFT/fastqc_v0.11.8.zip" \
&& unzip -q "$SOFT/fastqc_v0.11.8.zip" \
&& mv "$SOFT/FastQC" "$SOFT/FastQC_v0.11.8" \
&& chmod +x "$SOFT/FastQC_v0.11.8/fastqc" \
&& rm "$SOFT/fastqc_v0.11.8.zip"
ENV FASTQC="$SOFT/FastQC_v0.11.8/fastqc"
# libdeflate (03.07.2019)
# Installation to custom path: https://github.com/ebiggers/libdeflate/issues/46
RUN cd "$SOFT" \
&& git clone "https://github.com/ebiggers/libdeflate.git" \
&& cd "$SOFT/libdeflate" \
&& git checkout -b br190705 c9ed42ae3b3805b6431f9f55fde38134e2f5a2a1 \
&& make -j"$(($(nproc)+1))" \
# && mkdir -p "$SOFT/libdeflate-190705/lib" "$SOFT/libdeflate-190705/include" \
&& make install PREFIX="$SOFT/libdeflate-190705" \
&& rm -r "$SOFT/libdeflate"
ENV LD_LIBRARY_PATH="$SOFT/libdeflate-190705/lib:$LD_LIBRARY_PATH" \
PATH="$SOFT/libdeflate-190705/bin:$PATH"
# when compiling something using the library you need to set CPPFLAGS=-I$prefix/include and LDFLAGS=-L$prefix/lib
# samtools 1.9 & htslib 1.9
RUN cd "$SOFT" \
&& wget -q "https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2" -O "$SOFT/samtools-1.9.tar.bz2" \
&& tar -xjf "$SOFT/samtools-1.9.tar.bz2" \
&& mv "$SOFT/samtools-1.9" "$SOFT/samtools-1.9-src" \
&& cd "$SOFT/samtools-1.9-src/htslib-1.9" \
&& ./configure --prefix="$SOFT/htslib-1.9" --enable-libcurl --enable-plugins --with-libdeflate CFLAGS="-fPIC -O3" CPPFLAGS="-I$SOFT/libdeflate-190705/include" LDFLAGS="-L$SOFT/libdeflate-190705/lib" \
&& make -j"$(($(nproc)+1))" \
&& make install \
&& cd "$SOFT/samtools-1.9-src" \
&& ./configure --prefix="$SOFT/samtools-1.9" --with-htslib="$SOFT/htslib-1.9" CFLAGS="-g -O3 -fPIC" \
&& make -j"$(($(nproc)+1))" \
&& make install \
&& cd "$SOFT" \
&& rm -r "$SOFT/samtools-1.9-src" \
&& rm "$SOFT/samtools-1.9.tar.bz2"
ENV SAMTOOLS="$SOFT/samtools-1.9/bin/samtools" \
BGZIP="$SOFT/htslib-1.9/bin/bgzip" \
TABIX="$SOFT/htslib-1.9/bin/tabix" \
PATH="$SOFT/samtools-1.9/bin:$SOFT/htslib-1.9/bin:$PATH" \
LD_LIBRARY_PATH="$SOFT/htslib-1.9/lib:$LD_LIBRARY_PATH" \
HTSLIB_DIR="$SOFT/htslib-1.9"
# picard 2.20.2
# TODO: remove '-Dpicard.useLegacyParser=false' option from all picard commands after full transition to new syntax: https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
RUN cd "$SOFT" \
&& wget -q "https://github.com/broadinstitute/picard/releases/download/2.20.2/picard.jar" -O "$SOFT/picard-2.20.2.jar"
ENV PICARD="$SOFT/picard-2.20.2.jar"
# perl & mysql
RUN apt-get --yes --force-yes --no-install-recommends install \
libdbd-mysql-perl \
libmysqlclient-dev \
libpng-dev \
uuid-dev
RUN apt-get clean \
&& rm -rf /var/lib/apt/lists
# cpanm 1.7044
# https://metacpan.org/pod/App::cpanminus#INSTALLATION
RUN cd "$SOFT" \
&& mkdir -p "$SOFT/cpanm/bin" \
&& curl -sSL --insecure "https://cpanmin.us/" -o "$SOFT/cpanm/bin/cpanm" \
&& chmod +x "$SOFT/cpanm/bin/cpanm"
ENV CPANM="$SOFT/cpanm/bin/cpanm" \
PATH="$SOFT/cpanm/bin:$PATH"
ENV VEP_CACHE_VERSION="97"
ENV BRANCH="release/${VEP_CACHE_VERSION}"
# bioperl-live 1-7-2
# ensembl-vep/travisci/get_dependencies.sh: 1/4. BioPerl-live.
RUN cd "$SOFT" && \
wget -q "https://github.com/bioperl/bioperl-live/archive/release-1-7-2.zip" -O "$SOFT/bioperl-live-release-1-7-2.zip" && \
unzip -q "$SOFT/bioperl-live-release-1-7-2.zip" && \
rm "$SOFT/bioperl-live-release-1-7-2.zip"
ENV PERL5LIB="$PERL5LIB:$SOFT/bioperl-live-release-1-7-2"
# Bio::DB::HTS
# ensembl-vep/travisci/get_dependencies.sh: 3/4. Bio::DB::HTS
# ensembl-vep/travisci/build_c.sh (Install/compile more libraries): 2/3. Bio::DB::HTS
RUN cd "$SOFT" && \
wget -q "https://github.com/Ensembl/Bio-DB-HTS/archive/3.01.zip" -O "$SOFT/Bio-DB-HTS-3.01.zip" && \
unzip -q "$SOFT/Bio-DB-HTS-3.01.zip" && \
mv "$SOFT/Bio-DB-HTS-3.01" "$SOFT/Bio-DB-HTS-3.01-src" && \
cd "$SOFT/Bio-DB-HTS-3.01-src" && \
# By default (without --prefix) installs '/usr/local/lib/perl/5.18.2/Bio/DB/HTS.pm'
# 'HTSLIB_DIR' env variable is used
perl Build.PL --prefix "$SOFT/Bio-DB-HTS-3.01" && \
./Build && \
./Build install && \
cd "$SOFT" && \
rm -r "$SOFT/Bio-DB-HTS-3.01-src" && \
rm "$SOFT/Bio-DB-HTS-3.01.zip"
# https://www.ensembl.org/info/docs/api/api_installation.html
ENV PERL5LIB="$PERL5LIB:$SOFT/Bio-DB-HTS-3.01/lib/perl/5.18.2"
# ensembl-variation C code: Compile Variation LD C scripts
RUN cd "$SOFT" && \
git clone --branch $BRANCH --depth 1 https://github.com/Ensembl/ensembl-variation.git && \
mkdir -p "$SOFT/ensembl-variation_C_code/src" && \
mv "$SOFT/ensembl-variation/C_code/"* "$SOFT/ensembl-variation_C_code/src/" && \
cd "$SOFT/ensembl-variation_C_code/src" && \
# 'HTSLIB_DIR' env variable shoud be defined and should point to a directory with htslib sources + compiled, where an 'htslib` subdir with *.h files is and where hts.* compiled files are.
# Changing it to fit htslib's --prefix installation
sed -i -e "s|-I \$(HTSLIB_DIR)/htslib|-I \$(HTSLIB_DIR)/include/htslib|" Makefile && \
sed -i -e "s|\$(HTSLIB_DIR) |\$(HTSLIB_DIR)/lib |g" Makefile && \
make -j"$(($(nproc)+1))" && \
# Copy 2 binaries to ../bin (this path is hardcoded into makefile)
make install && \
cd "$SOFT" && \
rm -r "$SOFT/ensembl-variation_C_code/src" && \
rm -r "$SOFT/ensembl-variation"
ENV PATH="$SOFT/ensembl-variation_C_code/bin:$PATH"
# bioperl-ext, faster alignments for haplo (XS-based BioPerl extensions to C libraries) - used by Haplosaurus
RUN cd "$SOFT" && \
git clone "https://github.com/bioperl/bioperl-ext.git" && \
cd bioperl-ext && \
git checkout -b branch180924 73138e9f26b9cb6321288bff4fe2516e862aa975 && \
cd Bio/Ext/Align && \
# Update bioperl-ext Makefile.PL with '-fPIC'
perl -pi -e"s|(cd libs.+)CFLAGS=\\\'|\$1CFLAGS=\\\'-fPIC |" Makefile.PL && \
# Installing a folder 'lib/perl/5.18.2/auto/Bio/Ext/' with files 'Align/Align.so' & 'Align.pm' into PREFIX
perl Makefile.PL PREFIX="$SOFT/bioperl-ext_Bio-Ext-Align_180924" && \
make -j"$(($(nproc)+1))" && \
make install && \
cd "$SOFT" && \
rm -r "$SOFT/bioperl-ext"
ENV PERL5LIB="$PERL5LIB:$SOFT/bioperl-ext_Bio-Ext-Align_180924/lib/perl/5.18.2"
# ensembl-xs, faster run using re-implementation in C of some of the Perl subroutines - it contains compiled versions of certain key subroutines used in VEP
ENV ENSEMBL_XS_VERSION="2.3.2"
RUN cd "$SOFT" && \
wget -q "https://github.com/Ensembl/ensembl-xs/archive/${ENSEMBL_XS_VERSION}.zip" -O "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}.zip" && \
unzip -q "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}.zip" && \
mv "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}" "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}-src" && \
cd "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}-src" && \
perl Makefile.PL PREFIX="$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}" && \
make -j"$(($(nproc)+1))" && \
make -j"$(($(nproc)+1))" install && \
cd $SOFT && \
rm -r "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}-src" && \
rm "$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}.zip"
ENV PERL5LIB="$PERL5LIB:$SOFT/ensembl-xs-${ENSEMBL_XS_VERSION}/lib/perl/5.18.2"
# kent_base (jksrc) - used by Bio::DB::BigFile (bigWig parsing)
# ensembl-vep/travisci/get_dependencies.sh: 4/4. jksrc
# ensembl-vep/travisci/build_c.sh (Install/compile more libraries): 3/3. kent src & Build
ENV KENT_VERSION="383"
ENV KENT_SRC="$SOFT/kent-${KENT_VERSION}_base/src" \
MACHTYPE="x86_64"
RUN cd "$SOFT" && \
wget -q "https://github.com/ucscGenomeBrowser/kent/archive/v${KENT_VERSION}_base.zip" -O "$SOFT/kent-${KENT_VERSION}_base.zip" && \
unzip -q "$SOFT/kent-${KENT_VERSION}_base.zip" && \
# Only keep needed kent-${KENT_VERSION}_base libraries for VEP (Serge: src/hg is about 300 MB)
rm -r "$SOFT/kent-${KENT_VERSION}_base/java" "$SOFT/kent-${KENT_VERSION}_base/python" && \
rm -r "$SOFT/kent-${KENT_VERSION}_base/src/hg" && \
MYSQLINC="$(mysql_config --include | sed -e 's/^-I//g')" && \
MYSQLLIBS="$(mysql_config --libs)" && \
export MYSQLINC && \
export MYSQLLIBS && \
export CFLAGS="-fPIC" && \
echo "Making kent [1/2] ..." && \
cd "$KENT_SRC/lib" && \
echo 'CFLAGS="-fPIC"' > "$KENT_SRC/inc/localEnvironment.mk" && \
make clean && \
make -j"$(($(nproc)+1))" && \
echo "Making kent [2/2] ..." && \
cd "$KENT_SRC/jkOwnLib" && \
make clean && \
make -j"$(($(nproc)+1))" && \
rm "$SOFT/kent-${KENT_VERSION}_base.zip"
# VEP 97.0 (+ Wildtype.pm plugin from pVAC-Seq) (including cpanm dependancies)
ENV PERL5LIB="$PERL5LIB:$SOFT/cpanm/lib/perl5"
RUN cd "$SOFT" && \
git clone --branch $BRANCH --depth 1 https://github.com/Ensembl/ensembl-vep.git && \
mv "$SOFT/ensembl-vep" "$SOFT/ensembl-vep-97.0" && \
echo "Installing cpanm packages [1/5]: ensembl perl dependencies ..." && \
wget -q "https://raw.githubusercontent.com/Ensembl/ensembl/$BRANCH/cpanfile" -O "$SOFT/ensembl_cpanfile" && \
$CPANM --local-lib $SOFT/cpanm --installdeps --with-recommends --notest --cpanfile "$SOFT/ensembl_cpanfile" . && \
rm "$SOFT/ensembl_cpanfile" && \
echo "Installing cpanm packages [2/5]: Bundle::LWP as Bio::DB::BigFile dependancy ..." && \
$CPANM --local-lib $SOFT/cpanm --with-recommends --notest Bundle::LWP && \
echo "Installing cpanm packages [3/5]: Bio::DB::BigFile v1.07 ..." && \
cd "$SOFT" && \
wget -q "https://cpan.metacpan.org/authors/id/L/LD/LDS/Bio-BigFile-1.07.tar.gz" -O "$SOFT/Bio-BigFile-1.07.tar.gz" && \
tar -xzf "$SOFT/Bio-BigFile-1.07.tar.gz" && \
mv "$SOFT/Bio-BigFile-1.07" "$SOFT/Bio-BigFile-1.07-src" && \
cd "$SOFT/Bio-BigFile-1.07-src" && \
# 'KENT_SRC', 'HTSLIB_DIR' & 'MACHTYPE' should be defined
# Solution for new kent versions (newer than 335) from: https://github.com/GMOD/GBrowse-Adaptors/pull/19
sed -i "/^my \$LibFile = \"jkweb.a\";.*/a my \$HeaderFileHTS = \"tbx.h\";\nmy \$LibFileHTS = \"libhts.a\";" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)\\\$jk_include,\\\$jk_lib(.*)$|\1\$jk_include,\$jk_lib,\$hts_include,\$hts_lib\2|g" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)\[\\\$jk_include(.*)$|\1\[\$jk_include,\$hts_include\2|g" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)\"\\\$jk_lib/\\\$LibFile\",(.*)$|\1\"\$jk_lib/\$LibFile\",\"\$hts_lib/\$LibFileHTS\",\2|" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -E -i "s|^(.*)'-lz','-lssl'(.*)$|\1'-lz','-lssl','-pthread'\2|g" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
sed -i "/^.*if -e \"\\\$jksrc\/lib\/\\\$ENV.*/a \
\$hts_include = \"\$ENV{HTSLIB_DIR}\/include\" if -e \"\$ENV{HTSLIB_DIR}\/include\/htslib\/\$HeaderFileHTS\";\n \
\$hts_lib = \"\$ENV{HTSLIB_DIR}\/lib\" if -e \"\$ENV{HTSLIB_DIR}\/lib\/\$LibFileHTS\";" "$SOFT/Bio-BigFile-1.07-src/Build.PL" && \
perl Build.PL --prefix "$SOFT/Bio-BigFile-1.07" && \
./Build && \
./Build install && \
rm -r "$SOFT/Bio-BigFile-1.07-src" && \
rm "$SOFT/Bio-BigFile-1.07.tar.gz" && \
export PERL5LIB="$PERL5LIB:$SOFT/Bio-BigFile-1.07/lib/perl/5.18.2" && \
cd "$SOFT" && \
echo "Installing cpanm packages [4/5]: ensembl-vep perl dependencies ..." && \
$CPANM --local-lib $SOFT/cpanm --installdeps --with-recommends --notest --cpanfile ensembl-vep-97.0/cpanfile . && \
echo "Installing cpanm packages [5/5] ..." && \
# $CPANM --notest --local-lib $SOFT/cpanm Archive::Zip Module::Build Bio::Perl DBI DBD::mysql Set::IntervalTree JSON PerlIO::gzip Scalar::Util Try::Tiny
$CPANM --local-lib $SOFT/cpanm --with-recommends --notest Archive::Zip && \
cd "$SOFT/ensembl-vep-97.0" && \
perl INSTALL.pl --NO_TEST --NO_UPDATE --NO_HTSLIB --NO_BIOPERL -a ap -g ProteinSeqs,Downstream,Conservation,GO && \
# pVACtools Wildtype plugin, unchanged since 02.11.2017 till 19.06.2019: https://github.com/griffithlab/pVACtools/commit/f6099b390363e3b0bd0f93be9a8380b9139009dc
wget -q "https://raw.githubusercontent.com/griffithlab/pVACtools/475f8cb91403a4819cda68a330337bbc463d2e4c/tools/pvacseq/VEP_plugins/Wildtype.pm" -O "$HOME/.vep/Plugins/Wildtype.pm"
ENV VEP="$SOFT/ensembl-vep-97.0/vep" \
VEPFILTER="$SOFT/ensembl-vep-97.0/filter_vep" \
VEPINSTALL="$SOFT/ensembl-vep-97.0/INSTALL.pl" \
CONVERTCACHE="$SOFT/ensembl-vep-97.0/convert_cache.pl --bgzip $BGZIP --tabix $TABIX" \
PERL5LIB="$PERL5LIB:$SOFT/Bio-BigFile-1.07/lib/perl/5.18.2:$SOFT/ensembl-vep-97.0:$SOFT/ensembl-vep-97.0/modules" \
PATH="$SOFT/ensembl-vep-97.0:$SOFT/Bio-BigFile-1.07/bin:$PATH"
# Run VEP module tests ...
RUN cd "$SOFT/ensembl-vep-97.0" && \
perl -e "use Bio::DB::HTS::Tabix" && \
perl -e "use Bio::DB::BigFile" && \
prove -v t/*.t
COPY common_funcs.sh /usr/local/bin/
COPY vep.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/vep.sh
WORKDIR /outputs
ENTRYPOINT ["vep.sh"]
Dear @serge2016,
The LD VEP plugin is a VEP plugins returning Linkage disequilibrium data on the input variants. This plugin requires a script (written in C) from the ensembl-variation
API repository and needs to be compiled.
As the VEP installer only keeps the Perl modules, the Docker image needs to download and compile it in an extra step.
About the BioPerl installation, you are right: you can keep the bioperl-live
installation and use the option --NO_BIOPERL
with the installer.
Best regards, Laurent
I am closing this issue, but if you have any more questions please feel free to reopen it.
Best regards, Laurent
Thank you!
Hello! Could you tell me, please, why the installation instructions in the Dockerfile (https://github.com/Ensembl/ensembl-vep/blob/19a5e9c4c6d4d45710b771341ec228623855c0c9/docker/Dockerfile) are so different from the instructions on the page http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html? Why there are so complex installation steps in the Dockerfile?