HRGV / phyloFlash

phyloFlash - A pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.
GNU General Public License v3.0
75 stars 25 forks source link

SSU RefNR: Failed to establish connection. #177

Open nick-youngblut opened 1 year ago

nick-youngblut commented 1 year ago

As mentioned in https://github.com/HRGV/phyloFlash/issues/133, phyloFlash_makedb.pl --remote fails with the following error:

[17:36:19] Checking for required tools.
[17:36:19] Using bowtiebuild found at
           "/opt/conda/envs/phyloflash/bin/bowtie-build".
[17:36:19] Using grep found at "/usr/bin/grep".
[17:36:19] Using barrnapHGV found at
           "/opt/conda/envs/phyloflash/lib/phyloFlash/barrnap-HGV/bin/barrnap_HGV".
[17:36:19] Using bbmap found at "/opt/conda/envs/phyloflash/bin/bbmap.sh".
[17:36:19] Using bbduk found at "/opt/conda/envs/phyloflash/bin/bbduk.sh".
[17:36:19] Using vsearch found at "/opt/conda/envs/phyloflash/bin/vsearch".
[17:36:19] Using bbmask found at
           "/opt/conda/envs/phyloflash/bin/bbmask.sh".
[17:36:19] All required tools found.

This is phyloFlash_makedb.pl from phyloFlash.pl v3.4

[17:36:19] downloading latest univec from ncbi
[17:36:19]   Connecting to ftp.ncbi.nlm.nih.gov
[17:36:19]   Finding /pub/UniVec/UniVec
[17:36:20]   Found UniVec (1701925 bytes)
|---------------------------------------------------------------------------|
############################################################################

[17:36:25] downloading latest SSU RefNR from www.arb-silva.de
[17:36:25]   Connecting to ftp.arb-silva.de
[17:36:26]   Finding
           /current/Exports/*_SSURef_N?99_tax_silva_trunc.fasta.gz
[17:38:37] FATAL: Could not list files matching
           '*_SSURef_N?99_tax_silva_trunc.fasta.gz' in '/current/Exports/':

           Failed to establish connection.

           Aborting.
[17:38:37] Saving log to file phyloFlash_log_on_error

My conda env:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
_r-mutex                  1.0.1               anacondar_1    conda-forge
_sysroot_linux-64_curr_repodata_hack 3                   h5bd9786_13    conda-forge
alsa-lib                  1.2.3.2              h166bdaf_0    conda-forge
bbmap                     39.01                h5c4e2a8_0    bioconda
bcftools                  1.8                  h4da6232_3    bioconda
bedtools                  2.30.0               h468198e_3    bioconda
binutils_impl_linux-64    2.39                 he00db2b_1    conda-forge
binutils_linux-64         2.39                h5fc0e48_12    conda-forge
biopython                 1.76             py27h516909a_0    conda-forge
blas                      1.1                    openblas    conda-forge
bowtie                    1.2.3            py27h2bce143_2    bioconda
bwidget                   1.9.14               ha770c72_1    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.12.7            ha878542_0    conda-forge
cairo                     1.16.0            h9f066cc_1006    conda-forge
certifi                   2019.11.28       py27h8c360ce_1    conda-forge
curl                      7.76.1               h979ede3_1    conda-forge
emirge                    0.61.1           py27h30f897e_7    bioconda
expat                     2.5.0                hcb278e6_1    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
fribidi                   1.0.10               h36c2ea0_0    conda-forge
gcc_impl_linux-64         9.5.0               h99780fb_19    conda-forge
gcc_linux-64              9.5.0               h4258300_12    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
gfortran_impl_linux-64    9.5.0               hf1096a2_19    conda-forge
gfortran_linux-64         9.5.0               hdb51d14_12    conda-forge
giflib                    5.2.1                h0b41bf4_3    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
gsl                       2.6                  he838d99_2    conda-forge
gxx_impl_linux-64         9.5.0               h99780fb_19    conda-forge
gxx_linux-64              9.5.0               h43f449f_12    conda-forge
harfbuzz                  2.7.2                ha5b49bf_1    conda-forge
htslib                    1.7                           0    bioconda
icu                       67.1                 he1b5a44_0    conda-forge
jpeg                      9e                   h0b41bf4_3    conda-forge
kernel-headers_linux-64   3.10.0              h4a8ded7_13    conda-forge
krb5                      1.17.2               h926e7f8_0    conda-forge
lcms2                     2.14                 h6ed2654_0    conda-forge
ld_impl_linux-64          2.39                 hcc3a1bd_1    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge
libcblas                  3.9.0           5_h92ddd45_netlib    conda-forge
libcurl                   7.76.1               hc4aaa36_1    conda-forge
libdeflate                1.14                 h166bdaf_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc                    7.2.0                h69d50b8_2    conda-forge
libgcc-devel_linux-64     9.5.0               h0a57e50_19    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran               3.0.0                         1    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libglib                   2.66.3               hbe7bbb4_0    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
liblapack                 3.9.0           5_h92ddd45_netlib    conda-forge
libnghttp2                1.51.0               hdcd2b5c_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libsanitizer              9.5.0               h2f262e1_19    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               haa6b8db_3    conda-forge
libstdcxx-devel_linux-64  9.5.0               h0a57e50_19    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libtiff                   4.4.0                h82bc61c_5    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.3.0                h0b41bf4_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxml2                   2.9.10               h68273f3_2    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
llvm-openmp               8.0.1                hc9558a2_0    conda-forge
mafft                     7.520                hec16e2b_0    bioconda
make                      4.3                  hd18ef5c_1    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
numpy                     1.16.5           py27h95a1406_0    conda-forge
openblas                  0.3.3                ha44fe06_1    conda-forge
openjdk                   11.0.8               hacce0ff_0    conda-forge
openmp                    8.0.1                         0    conda-forge
openssl                   1.1.1t               h0b41bf4_0    conda-forge
pango                     1.42.4               h69149e4_5    conda-forge
pbzip2                    1.1.13                        0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pcre2                     10.36                h032f7d1_1    conda-forge
perl                      5.32.1          2_h7f98852_perl5    conda-forge
phyloflash                3.4                  hdfd78af_1    bioconda
pigz                      2.6                  h27826a3_0    conda-forge
pip                       20.1.1             pyh9f0ad1d_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pysam                     0.14.1           py27hae42fb6_1    bioconda
python                    2.7.15          h5a48372_1011_cpython    conda-forge
python_abi                2.7                    1_cp27mu    conda-forge
r-backports               1.4.1             r40hcfec24a_0    conda-forge
r-base                    4.0.3                ha43b4e8_3    conda-forge
r-brio                    1.1.3             r40hcfec24a_0    conda-forge
r-callr                   3.7.2             r40hc72bb7e_0    conda-forge
r-cli                     3.4.1             r40h7525677_0    conda-forge
r-colorspace              2.0_3             r40h06615bd_0    conda-forge
r-crayon                  1.5.1             r40hc72bb7e_0    conda-forge
r-desc                    1.4.2             r40hc72bb7e_0    conda-forge
r-diffobj                 0.3.5             r40hcfec24a_0    conda-forge
r-digest                  0.6.29            r40h03ef668_0    conda-forge
r-ellipsis                0.3.2             r40hcfec24a_0    conda-forge
r-evaluate                0.16              r40hc72bb7e_0    conda-forge
r-fansi                   1.0.3             r40h06615bd_0    conda-forge
r-farver                  2.1.1             r40h7525677_0    conda-forge
r-fs                      1.5.2             r40h7525677_1    conda-forge
r-getopt                  1.20.3            r40ha770c72_2    conda-forge
r-ggdendro                0.1.23            r40hc72bb7e_0    conda-forge
r-ggplot2                 3.3.6             r40hc72bb7e_0    conda-forge
r-glue                    1.6.2             r40h06615bd_0    conda-forge
r-gtable                  0.3.1             r40hc72bb7e_0    conda-forge
r-isoband                 0.2.5             r40h03ef668_0    conda-forge
r-jsonlite                1.8.0             r40h06615bd_0    conda-forge
r-labeling                0.4.2             r40hc72bb7e_1    conda-forge
r-lattice                 0.20_45           r40hcfec24a_0    conda-forge
r-lifecycle               1.0.2             r40hc72bb7e_0    conda-forge
r-magrittr                2.0.3             r40h06615bd_0    conda-forge
r-mass                    7.3_58.1          r40h06615bd_0    conda-forge
r-matrix                  1.4_1             r40h0154571_0    conda-forge
r-mgcv                    1.8_40            r40h0154571_0    conda-forge
r-munsell                 0.5.0           r40hc72bb7e_1004    conda-forge
r-nlme                    3.1_159           r40h8da6f51_0    conda-forge
r-optparse                1.7.3             r40hc72bb7e_0    conda-forge
r-pillar                  1.8.1             r40hc72bb7e_0    conda-forge
r-pkgconfig               2.0.3             r40hc72bb7e_1    conda-forge
r-pkgload                 1.3.0             r40hc72bb7e_0    conda-forge
r-plyr                    1.8.7             r40h7525677_0    conda-forge
r-praise                  1.0.0           r40hc72bb7e_1005    conda-forge
r-processx                3.7.0             r40h06615bd_0    conda-forge
r-ps                      1.7.1             r40h06615bd_0    conda-forge
r-r6                      2.5.1             r40hc72bb7e_0    conda-forge
r-rcolorbrewer            1.1_3             r40h785f33e_0    conda-forge
r-rcpp                    1.0.9             r40h7525677_1    conda-forge
r-rematch2                2.1.2             r40hc72bb7e_1    conda-forge
r-reshape2                1.4.4             r40h03ef668_1    conda-forge
r-rlang                   1.0.6             r40h7525677_0    conda-forge
r-rprojroot               2.0.3             r40hc72bb7e_0    conda-forge
r-scales                  1.2.1             r40hc72bb7e_0    conda-forge
r-stringi                 1.5.3             r40hca8494e_0    conda-forge
r-stringr                 1.4.1             r40hc72bb7e_0    conda-forge
r-testthat                3.1.4             r40h7525677_0    conda-forge
r-tibble                  3.1.8             r40h06615bd_0    conda-forge
r-utf8                    1.2.2             r40hcfec24a_0    conda-forge
r-vctrs                   0.4.1             r40h7525677_0    conda-forge
r-viridislite             0.4.1             r40hc72bb7e_0    conda-forge
r-waldo                   0.4.0             r40hc72bb7e_0    conda-forge
r-withr                   2.5.0             r40hc72bb7e_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
samtools                  1.6                  hb116620_7    bioconda
scipy                     1.2.0           py27_blas_openblashb06ca3d_200    conda-forge
sed                       4.8                  he412f7d_0    conda-forge
setuptools                44.0.0                   py27_0    conda-forge
sortmerna                 2.1b                 he860b03_4    bioconda
spades                    3.15.5               h95f258a_1    bioconda
sqlite                    3.40.0               h4ff8645_0    conda-forge
sysroot_linux-64          2.17                h4a8ded7_13    conda-forge
tbb                       2020.2               h4bd325d_4    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tktable                   2.10                 hb7b940f_3    conda-forge
vsearch                   2.22.1               hf1761c0_0    bioconda
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xorg-fixesproto           5.0               h7f98852_1002    conda-forge
xorg-inputproto           2.3.2             h7f98852_1002    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.8.4                h0b41bf4_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxfixes            5.0.3             h7f98852_1004    conda-forge
xorg-libxi                1.7.10               h7f98852_0    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-libxt                1.2.1                h7f98852_2    conda-forge
xorg-libxtst              1.2.3             h7f98852_1002    conda-forge
xorg-recordproto          1.14.2            h7f98852_1002    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge

My OS: Ubuntu 20.04.5

nick-youngblut commented 1 year ago

While there are some instructions at https://hrgv.github.io/phyloFlash/install.html for manual setup of the database, the instructions are not detailed enough to fully set up the database.

osvatic commented 1 year ago

I just spent some time today to figure out the issue.

You can complete to installation of the database by using wget to download the proper files and adding them into the command in the link.

#wget silva DB, location found using phyloflash error log
wget https://ftp.arb-silva.de/current/Exports/SILVA_138.1_SSURef_NR99_tax_silva_trunc.fasta.gz

#wget UniVec DB, location found using phyloflash error log
wget https://ftp.ncbi.nlm.nih.gov/pub/UniVec/UniVec

#use downloaded files as inputs for db. 
phyloFlash_makedb.pl --CPUs 16 --silva_file SILVA_138.1_SSURef_NR99_tax_silva_trunc.fasta.gz --univec_file UniVec

The exact path the you need to use for the -db_home option will be printed out at the end of the command.

Unaimend commented 1 year ago

I get the same error on my side

nick-youngblut commented 1 year ago

adding them into the command in the link

@osvatic There are many commands at https://hrgv.github.io/phyloFlash/install.html. I'm assuming you mean:

phyloFlash_makedb.pl --univec_file /path/to/Univec --silva_file /path/to/SILVA_128_SSURef_Nr99_tax_silva_trunc.fasta.gz
osvatic commented 1 year ago

@nick-youngblut That command is the final one but it get around the issue but you need to use the wget commands (shown in previous comment and below). They will download the correct versions of the silva and UniVec databases for you to run the phyloFlash_makedb.pl command.

#wget silva DB, location found using phyloflash error log
wget https://ftp.arb-silva.de/current/Exports/SILVA_138.1_SSURef_NR99_tax_silva_trunc.fasta.gz

#wget UniVec DB, location found using phyloflash error log
wget https://ftp.ncbi.nlm.nih.gov/pub/UniVec/UniVec
kbseah commented 1 year ago

Thanks for taking the time to help each other out. I've updated the docs based on the ambiguities that have been pointed out here.

Preformatted SILVA databases are now available for download from Zenodo. This will be displayed as the preferred setup method once the pull request is accepted.

Background to some of the design decisions: When phyloFlash was originally developed, SILVA was released under a license that did not allow commercial use beyond a test period, which is why the database setup script was designed to explicitly display the license conditions and get interactive user confirmation before continuing. SILVA 138 onwards is released under CC-BY 4.0, which is more permissive, but we didn't change the database setup because it still worked and active development was essentially complete.