b-twice / arcgis-rest-toolbox

Set of tools for managing ArcGIS REST Services
7 stars 6 forks source link

pullReplica_vanilla.py #4

Open trillevine opened 9 years ago

trillevine commented 9 years ago

Hi Brian,

I cleaned up and modified this code a bit. A few things to note:

  1. As far as I can tell, in order to backup all layers in a service, you have to specify each layer id in the "layers" part of the the REPLICA dictionary.
  2. Added functionality to allow this to delete yesterday's backup and create a folder with a name of my choosing for today's backup and use today's folder as the destination. I found that when running the script with an IDE, a relative path for yesterday's folder worked, but it failed when running it from the command line -- that's why a hardcoded path is used for finding yesterday's folder. Maybe there's a cleaner way to do this than what I did, but it works fine.
  3. I changed the logic for naming the geodatabase -- I commented out the get_fs_name def since that just takes the name from the first layer and uses it for the geodb, which I didn't want -- but it's easy enough to swap that out and use what you originally wrote if desired.
  4. To get this to work in 2.6, I had to change the first line in if format: clause in the pull_to_local def to: output = open(str(name) + '.' + str(file_format), 'wb')
  5. Added functionality to unzip the dl'd gdb and name it according to my desired naming convention.

Here you go:

import json, urllib, urllib2, urlparse
import os, shutil
import time, datetime
from datetime import date, timedelta
import zipfile

today = date.today()
todayString = today.strftime("%Y.%m.%d")
serviceTodayString = 'todaysFolder_'+ todayString
yesterday = date.today() - timedelta(1)
yesterdayString = yesterday.strftime('%Y.%m.%d')
serviceYesterdayString = r'G:\path\yesterdayFolder'+ yesterdayString

# ======================== #
#   REST functions         #
# ======================== #
def check_service(serviceUrl):
    if serviceUrl == None:
        return True
    else:
        if os.path.split(serviceUrl)[-1] != 'FeatureServer':
            return False
        return True

def get_response(url, query='', get_json=True):
    encodedUrl = urllib.urlencode(query)
    request = urllib2.Request(url, encodedUrl)
    if get_json:
        return json.loads(urllib2.urlopen(request).read())
    return urllib2.urlopen(request).read()

def add_path(url, path):
    return urlparse.urljoin(url + "/", path)

def login (username, password):
    CREDENTIALS['username'] = username
    CREDENTIALS['password'] = password
    response = get_response(TOKEN_URL, CREDENTIALS)
    if 'error' in response:
        print response['error']
        exit()
    else:
        return response['token']

## get feature service name -- not necessary, only for file naming;
## using another naming convention, see pull_replica def
##def get_fs_name(fs_url, token):
##    fsName = get_response(fs_url,
##        {'f':'json', 'token':token})['layers'][0]['name']
##    print fsName
##    return fsName

# ======================== #
#        OS functions      #
# ======================== #

def pull_to_local(url, name, destination, file_format = ''):
    if destination:
        os.chdir(destination)
        cwd = os.getcwd()
        os.chdir(cwd + '\\' + serviceTodayString)
    if format:
#        output = open(str(name) + '.{}'.format(file_format), 'wb')
        output = open(str(name) + '.' + str(file_format), 'wb')
        output.write(url)
        output.close()
        zipped = zipfile.ZipFile(output.name)
        unzipped = zipped.extractall()
##      get list of subdirectories in current directory
        subdiretories = [x[0] for x in os.walk(os.getcwd())]
##      extract geodb basename from geodb path (subdirectory index 1) & rename it for today
        os.rename(os.path.basename(subdiretories[1]), serviceTodayString + '.gdb')

    else:
        output = open(str(name), 'wb')
        output.write(url)
        output.close()

# ======================== #
#          Queries         #
# ======================== #

CREDENTIALS = {
    'username': '',
    'password': '',
    'expiration': '60',
    'client': 'referer',
    'referer': 'www.arcgis.com',
    'f': 'json'
}

TOKEN_URL = "https://www.arcgis.com/sharing/rest/generateToken"

REPLICA = {
    "geometry": '',
    "geometryType": "esriGeometryEnvelope",
    "inSR": '',
    "layerQueries": '',
    "layers": "0, 1, 2, 3, 4, 5, 6, 7",
    "replicaName": "read_only_rep",
    "returnAttachments": 'true',
    "returnAttachmentsDataByUrl": 'true',
    "transportType": "esriTransportTypeEmbedded",
    "async": 'false',
    "syncModel": "none",
    "dataFormat": "filegdb",
    "token": '',
    "replicaOptions": '',
    "f": "json"
}

def pull_replica(fs_url, query, token, destination):
    query['token'] = token
    replica_url = add_path(fs_url, "createReplica")
    zip_url = get_response(replica_url, query)['responseUrl']
    zip_file = get_response(zip_url, get_json=False)
    file_name = serviceTodayString
#    file_name = time.strftime("%Y_%m_%d_") + get_fs_name(fs_url, token)
    pull_to_local(zip_file, file_name, destination, 'zip')

if __name__ == "__main__":
    TOKEN = login("userHere", "passwordHere")
    FS_URL = "serviceUrlHere"
   #  remove folder from yesterday
    if serviceYesterdayString:
            shutil.rmtree(serviceYesterdayString)
    # make directory for geodb
    os.mkdir(os.path.join(os.getcwd(), serviceTodayString))
    DEST = r"G:\pathHere"
    pull_replica(FS_URL, REPLICA, TOKEN, DEST)
b-twice commented 9 years ago

Hey Trill, thanks for some input and sending over the functionality you are building in. Great you are modifying and it's working out for your needs.

In regards to your first point, I did add some functionality to update the query based on whether the url was for a layer or for the feature service. So if you pass in the feature service url, and not a layer in it, it will push out all layers.

I.e. below is a snippet from the "pull_replica" function that pulls in all the layers to replicate into one gdb if the url ends in "FeatureServer". If it ends in "FeatureServer/0" then will just pull layer 0.

        query['layers'] = [layer['id'] for layer in layers]
        self.replicate(query)