WGS-TB / MentaLiST

The MLST pipeline developed by the PathOGiST research group
MIT License
36 stars 11 forks source link

New Enterobase URL #31

Open dfornika opened 6 years ago

dfornika commented 6 years ago

Enterobase now provides the follwing URL for downloading schemes:

http://enterobase.warwick.ac.uk/schemes/

crarlus commented 6 years ago

My solution is to use the following python script, which has been adapted from the Enterobase API How-To (https://bitbucket.org/enterobase/enterobase-web/wiki/api_download_schemes):

import os
import urllib2
import json
import base64
import sys
from urllib2 import HTTPError
import logging

# You must have a valid API Token
API_TOKEN =os.getenv('ENTEROBASE_API_TOKEN', None)
SERVER_ADDRESS = 'http://enterobase.warwick.ac.uk'
DATABASE = 'senterica'
SCHEME = 'cgMLST_v2'
LIMIT = 10000
TARGET_DIR = 'alleles_cgMLST_v2'

def __create_request(request_str):

    request = urllib2.Request(request_str)
    base64string = base64.encodestring('%s:%s' % (API_TOKEN,'')).replace('\n', '')
    request.add_header("Authorization", "Basic %s" % base64string)
    return request

if not os.path.exists('%s' %TARGET_DIR):
    os.mkdir('%s' %TARGET_DIR)
address = SERVER_ADDRESS + '/api/v2.0/%s/%s/loci?'\
    '&limit=%d&scheme=%s' \
    %(DATABASE, SCHEME, LIMIT, SCHEME)
print("Fetching scheme loci list from " + address)
try:
    response = urllib2.urlopen(__create_request(address))
    data = json.load(response)
    print("Download of json complete")

    for record in data['loci']:
        record_locus = record['locus']
        record_link = record['download_alleles_link']
        print("Downloading alleles for locus " + record_locus)
        response = urllib2.urlopen(__create_request(record_link))
        with open(os.path.join('%s' %TARGET_DIR, '%s.fasta.gz' %record_locus),'wb') as out_ass: 
            out_ass.write(response.read())
except HTTPError as Response_error:
    logging.error('%d %s. <%s>\n Reason: %s' %(Response_error.code,
                                              Response_error.msg,
                                              Response_error.geturl(),
                                              Response_error.read()))

It requires a valid Enterobase token

dfornika commented 6 years ago

Hi @crarlus thanks for this. We should be able to use this as some inspiration to improve the MentaLiST Enterobase download methods. We're currently not supporting downloads that require an API token but we may be able to do that if there are protected datasets that we want access to.