abaizan / kodoja

Kodoja: identifying viruses from plant RNA sequencing data
MIT License
7 stars 6 forks source link

TravisCI: Python 2.7, NCBI download failing #36

Closed peterjc closed 5 years ago

peterjc commented 5 years ago

This is not a network issue - Python 3 is working fine, but a download on Python 2.7 is failing:

(updated) - Initially thought it was failing in the call to ncbi-genome-download

https://travis-ci.org/abaizan/kodoja/jobs/572680972 using ncbi-genome-download version 0.2.6

==============================================================
Testing kodoja_build.py
==============================================================
INFO: Checking record u'GCF_000884835.1'
INFO: Using cached summary.
INFO: Checking record u'GCF_000884835.1'
INFO: Using cached summary.
INFO: Checking record u'GCF_000888855.1'
INFO: Using cached summary.
INFO: Checking record u'GCF_000888855.1'
INFO: Using cached summary.
INFO: Checking record u'GCF_000861345.1'
INFO: Using cached summary.
INFO: Checking record u'GCF_000861345.1'
Running kodoja_build.py with three viruses as input
Will retry downloading 'https://ftp.ncbi.nih.gov/genomes/refseq/viral/assembly_summary.txt' attempt 2
Will retry downloading 'https://ftp.ncbi.nih.gov/genomes/refseq/viral/assembly_summary.txt' attempt 3
Will retry downloading 'https://ftp.ncbi.nih.gov/genomes/refseq/viral/assembly_summary.txt' attempt 4
Will retry downloading 'https://ftp.ncbi.nih.gov/genomes/refseq/viral/assembly_summary.txt' attempt 5
Failed to download 'https://ftp.ncbi.nih.gov/genomes/refseq/viral/assembly_summary.txt'
The command "test/test_script.sh" exited with 1.

Likewise https://travis-ci.org/abaizan/kodoja/jobs/572680974 using ncbi-genome-download version 0.2.10

Using Python 2.7 on macOS, the call worked:

$ ncbi-genome-download --version
0.2.10
$ ncbi-genome-download --verbose -o . -F fasta -t 12227 viral
INFO: Checking record u'GCF_000861345.1'
peterjc commented 5 years ago

Working build, latest versions

$ pip install $PIP
...
Successfully installed appdirs-1.4.3 biopython-1.73 chardet-3.0.4 idna-2.8 ncbi-genome-download-0.2.9 pandas-0.24.0 python-dateutil-2.7.5 pytz-2018.9 requests-2.21.0 six-1.12.0 urllib3-1.24.1

Working build, pinned versions

Successfully installed asn1crypto-0.24.0 biopython-1.67 cffi-1.11.5 chardet-3.0.4 cryptography-2.5 enum34-1.1.6 idna-2.8 ipaddress-1.0.22 ncbi-genome-download-0.2.6 pandas-0.14.0 pyOpenSSL-19.0.0 pycparser-2.19 python-dateutil-2.7.5 pytz-2018.9 requests-2.21.0 six-1.12.0 urllib3-1.24.1

Failing build, pinned versions:

Successfully installed asn1crypto-0.24.0 biopython-1.67 cffi-1.12.3 chardet-3.0.4 cryptography-2.7 enum34-1.1.6 idna-2.8 ipaddress-1.0.22 ncbi-genome-download-0.2.6 pandas-0.14.0 pyOpenSSL-19.0.0 pycparser-2.19 python-dateutil-2.8.0 pytz-2019.2 requests-2.22.0 six-1.12.0 urllib3-1.25.3

Failing build, latest versions:

Successfully installed appdirs-1.4.3 biopython-1.74 chardet-3.0.4 idna-2.8 ncbi-genome-download-0.2.10 pandas-0.24.2 python-dateutil-2.8.0 pytz-2019.2 requests-2.22.0 six-1.12.0 urllib3-1.25.3

Could be requests-2.21.0 --> requests-2.22.0 and/or urllib3-1.24.1 --> urllib3-1.25.3

peterjc commented 5 years ago

According to https://2.python-requests.org/en/master/community/updates/#release-history the only change from requests-2.21.0 --> requests-2.22.0 was:

Requests now supports urllib3 v1.25.2. (note: 1.25.0 and 1.25.1 are incompatible)

According to https://github.com/urllib3/urllib3/blob/master/CHANGES.rst there were a lot of changes from urllib3-1.24.1 --> urllib3-1.25.3 including stricter handling of security certificates (relevant as this is an HTTPS download).

peterjc commented 5 years ago

By adding --debug to the ncbi-genome-download call, I realised the failure is here:

https://github.com/abaizan/kodoja/blob/kodoja-v0.0.9/diagnosticTool_scripts/database_modules.py#L20

i.e. Python standard library urlretrieve (under Python 2)

peterjc commented 5 years ago

Using pip install pandas==0.14 biopython==1.67 ncbi-genome-download==0.2.6 requests==2.21.0 urllib3==1.24.1 did not solve this:

https://travis-ci.org/abaizan/kodoja/jobs/572771280

peterjc commented 5 years ago

Useful logging:

Running kodoja_build.py with three viruses as input
Will retry downloading 'https://ftp.ncbi.nih.gov/genomes/refseq/viral/assembly_summary.txt' attempt 2
Will retry downloading 'https://ftp.ncbi.nih.gov/genomes/refseq/viral/assembly_summary.txt' attempt 3
Will retry downloading 'https://ftp.ncbi.nih.gov/genomes/refseq/viral/assembly_summary.txt' attempt 4
Will retry downloading 'https://ftp.ncbi.nih.gov/genomes/refseq/viral/assembly_summary.txt' attempt 5
Download failed: [Errno socket error] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:727)
Kodoja failed unexpectedly with the following:
Traceback (most recent call last):
  File "diagnosticTool_scripts/kodoja_build.py", line 168, in <module>
    main()
  File "diagnosticTool_scripts/kodoja_build.py", line 102, in main
    os.path.join(args.output_dir, 'viral_assembly_summary.txt'))
  File "/home/travis/build/abaizan/kodoja/diagnosticTool_scripts/database_modules.py", line 40, in download_with_retries
    raise err
IOError: [Errno socket error] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:727)