b-twice / arcgis-rest-toolbox

Set of tools for managing ArcGIS REST Services
7 stars 6 forks source link

JSON Error #8

Open trillevine opened 9 years ago

trillevine commented 9 years ago

Hi Brian,

I've been implementing a variation of restservices.py for a while now without problems. I split the service replication and attatchment downloading into two separate scripts (long story), but recently I've experienced problems with the service replication script. I just checked your script again and tried using it for replicating a service, but am having the same issues I had with mine. Something isn't working correctly with the JSON response, and I'm not sure what the problem is...I built in a try-catch block into the get_response def to catch json response problems, and I'm getting an endless loop of exceptions. Have you experienced anything like this recently with this script? Please let me know if you have a minute -- I've pasted my variation in below. Thanks!

Code:

import json, urllib, urllib2, urlparse
import os, shutil
import time, datetime
from datetime import date, timedelta
import csv
import logging

#  see all  # #############################################
            # ---- values to be changed accordingly ----- #
            # #############################################

today = date.today()
todayString = today.strftime("%Y.%m.%d")
# #############################################
# ---- Change service name_ as necessary ---- #
# #############################################
serviceTodayString = 'Forstmobil_'+ todayString
# #############################################
# ----- Change destination as necessary ----- #
# #############################################
## change to directory where service_date folder will be created
os.chdir(r'G:\Dvkoord\GIS\TEMP\Tle\Scripts')
if os.path.exists(serviceTodayString):
        shutil.rmtree(serviceTodayString)
## make folder for today's download
os.mkdir(serviceTodayString)
yesterday = date.today() - timedelta(1)
yesterdayString = yesterday.strftime('%Y.%m.%d')
# #############################################
# ---- Change absolute path as necessary ---- #
# #############################################
serviceYesterdayString = r'G:\Dvkoord\GIS\TEMP\Tle\Scripts\Forstmobil_'+ yesterdayString
# logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
## create a file handler
handler = logging.FileHandler('service_forstmobil.log')
handler.setLevel(logging.INFO)
## create a logging format
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
## add the handlers to the logger
logger.addHandler(handler)

### REST FUNCTIONS

def check_service(service_url):
    url_parts = {"fs_url": None, "layer_url":None, "layer_id":None}
    components = os.path.split(service_url)
    if service_url == None:
        return True
    elif (components[1].isdigit() and os.path.split(components[0])[1] ==
    "FeatureServer"):
        url_parts["fs_url"] = components[0]
        url_parts["layer_url"] = service_url
        url_parts["layer_id"] = str(components[1])
        return url_parts
    elif components[1] == "FeatureServer":
        url_parts["fs_url"] = service_url
        return url_parts
    else:
        return False

def get_service_name(service_url):
    components = os.path.split(service_url)
    if components[1] == "FeatureServer":
        return os.path.split(components[0])[1]
    else:
        return get_service_name(components[0])

def get_response(url, query='', get_json=True):
    opener = urllib2.build_opener(
         urllib2.HTTPHandler(),
         urllib2.HTTPSHandler(),
         urllib2.ProxyHandler(
                {'https': 'http://' + sys.argv[1] + ':' + sys.argv[2] + '@<ip:port>',
                 'http': 'http://' + sys.argv[1] + ':' + sys.argv[2] + '@<ip:port>'}
    ))
    urllib2.install_opener(opener)
    encoded = urllib.urlencode(query)
    request = urllib2.Request(url, encoded)
    if get_json:
        try:
            json_response = json.loads(urllib2.urlopen(request).read())
            return json_response
        except:
            print 'exception'
            logger.info('JSON Error')
            RUN = App(INPUT_URL, TOKEN, DEST)
            RUN.pull_replica(REPLICA)
    return urllib2.urlopen(request).read()

def add_path(url, *args):
    for arg in args:
        url = urlparse.urljoin(url + "/", str(arg))
    return url

def login (username, password):
    CREDENTIALS['username'] = username
    CREDENTIALS['password'] = password
    response = get_response(TOKEN_URL, CREDENTIALS)
    return response['token']

def get_service_info(input_url, token):
    return get_response(input_url,
        {'f':'json', 'token':token})

### QUERIES

CREDENTIALS = {
    'username': '',
    'password': '',
    'expiration': '300',
    'client': 'referer',
    'referer': 'www.arcgis.com',
    'f': 'json'
}

TOKEN_URL = "https://www.arcgis.com/sharing/rest/generateToken"

ATTACHMENTS = {
    'where': '1=1',
    'token': '',
    'f': 'json',
    'returnGeometry':'false'
}

REPLICA = {
    "geometry": '',
    "geometryType": "esriGeometryEnvelope",
    "inSR": '',
    "layerQueries": '',
    "layers": '0',
    "replicaName": "read_only_rep",
    "returnAttachments": 'true',
    "returnAttachmentsDataByUrl": 'false',
    "transportType": "esriTransportTypeEmbedded",
    "async": 'false',
    "syncModel": "none",
    "dataFormat": "filegdb",
    "token": '',
    "replicaOptions": '',
    "f": "json"
}

UPDATES = {
    "f": "json",
    "features": '',
    "rollbackOnFailure":True
}

class App(object):
    ''' Class with methods to perform tasks with ESRI's REST API '''
    def __init__ (self, input_url, token, destination):
        self.input_url = input_url
        self.token = token
        self.destination = destination
        self.layer_url = None
        self.layer_id = None
        self.fs_url = self.check_input_url()

    def check_input_url(self):
        url_parts = check_service(self.input_url)
        self.layer_url = url_parts["layer_url"]
        self.layer_id = url_parts["layer_id"]
        if not self.layer_url:
            self.layer_url = add_path(url_parts["fs_url"], "0")
        return url_parts["fs_url"]

    def get_root_name(self):
        return time.strftime("%Y_%m_%d_") + get_service_name(self.fs_url)

    def replicate(self, query):
        replica_url = add_path(self.fs_url, 'createReplica')
        zip_url = get_response(replica_url, query)['responseUrl']
        zip_file = get_response(zip_url, get_json=False)
        pull_to_local(zip_file, self.get_root_name(), self.destination, 'zip')

    def pull_replica(self, query):
        query['token'] = self.token
        layers = get_service_info(self.fs_url, self.token)['layers']
        if self.layer_id:
            query['layers'] = self.layer_id
            self.replicate(query)
        else:
            query['layers'] = [layer['id'] for layer in layers]
            self.replicate(query)

if __name__ == "__main__":

    ## Required
    TOKEN = login("mylogin", "mypassword")
    INPUT_URL = "http://services1.arcgis.com/0cr41EdkajvOA232/ArcGIS/rest/services/Forstmobil/FeatureServer"

    ## Required for Pull Attachments and Pull Replica
    DEST = r"G:\Dvkoord\GIS\TEMP\Tle\Scripts" + "\\" + serviceTodayString

    ## Required for Update Service
    UPDATE_TABLE = "<table to update service>"

    ## Optional field to label folders by attributes for Pull Attachments
    FIELD = ""

    ## To return attachments in the geodatabase for replicate uncomment the line as follows:
    ## REPLICA[returnAttachments] = true

    RUN = App(INPUT_URL, TOKEN, DEST)
    RUN.pull_replica(REPLICA)
b-twice commented 9 years ago

Hey Trill,

The only issue I ever noticed was a month or so ago when I was getting hit with server errors. There was no indication that ArcGIS Online was experiencing any issues, but I was unable to replicate anything, even when going through my service directory and doing it directly in ArcGIS Online. I gave it another go the next day and everything went as normal.

I didn't see any error messages in the script you posted. Can you provide some of the messages you got when running your version and my version?

On Mon, Jul 13, 2015 at 5:30 AM, trillevine notifications@github.com wrote:

Hi Brian,

I've been implementing a variation of restservices.py for a while now without problems. I split the service replication and attatchment downloading into two separate scripts (long story), but recently I've experienced problems with the service replication script. I just checked your script again and tried using it for replicating a service, but am having the same issues I had with mine. Something isn't working correctly with the JSON response, and I'm not sure what the problem is...I built in a try-catch block into the get_response def to catch json response problems, and I'm getting an endless loop of exceptions. Have you experienced anything like this recently with this script? Please let me know if you have a minute -- I've pasted my variation in below. Thanks!

  • Trill

Code:

import json, urllib, urllib2, urlparse import os, shutil import time, datetime from datetime import date, timedelta import csv import logging

see all

        # ---- values to be changed accordingly ----- #
        # #############################################

today = date.today() todayString = today.strftime("%Y.%m.%d")

---- Change service name_ as necessary ----

serviceTodayString = 'Forstmobil_'+ todayString

----- Change destination as necessary -----

change to directory where service_date folder will be created

os.chdir(r'G:\Dvkoord\GIS\TEMP\Tle\Scripts') if os.path.exists(serviceTodayString): shutil.rmtree(serviceTodayString)

make folder for today's download

os.mkdir(serviceTodayString) yesterday = date.today() - timedelta(1) yesterdayString = yesterday.strftime('%Y.%m.%d')

---- Change absolute path as necessary ----

serviceYesterdayString = r'G:\Dvkoord\GIS\TEMP\Tle\Scripts\Forstmobil_'+ yesterdayString

logging

logger = logging.getLogger(name) logger.setLevel(logging.INFO)

create a file handler

handler = logging.FileHandler('service_forstmobil.log') handler.setLevel(logging.INFO)

create a logging format

formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') handler.setFormatter(formatter)

add the handlers to the logger

logger.addHandler(handler)

REST FUNCTIONS

def check_service(service_url): url_parts = {"fs_url": None, "layer_url":None, "layer_id":None} components = os.path.split(service_url) if service_url == None: return True elif (components[1].isdigit() and os.path.split(components[0])[1] == "FeatureServer"): url_parts["fs_url"] = components[0] url_parts["layer_url"] = service_url url_parts["layer_id"] = str(components[1]) return url_parts elif components[1] == "FeatureServer": url_parts["fs_url"] = service_url return url_parts else: return False

def get_service_name(service_url): components = os.path.split(service_url) if components[1] == "FeatureServer": return os.path.split(components[0])[1] else: return get_service_name(components[0])

def get_response(url, query='', get_json=True): opener = urllib2.build_opener( urllib2.HTTPHandler(), urllib2.HTTPSHandler(), urllib2.ProxyHandler( {'https': 'http://' + sys.argv[1] + ':' + sys.argv[2] + '@ip:port', 'http': 'http://' + sys.argv[1] + ':' + sys.argv[2] + '@ip:port'} )) urllib2.install_opener(opener) encoded = urllib.urlencode(query) request = urllib2.Request(url, encoded) if get_json: try: json_response = json.loads(urllib2.urlopen(request).read()) return json_response except: print 'exception' logger.info('JSON Error') RUN = App(INPUT_URL, TOKEN, DEST) RUN.pull_replica(REPLICA) return urllib2.urlopen(request).read()

def add_path(url, *args): for arg in args: url = urlparse.urljoin(url + "/", str(arg)) return url

def login (username, password): CREDENTIALS['username'] = username CREDENTIALS['password'] = password response = get_response(TOKEN_URL, CREDENTIALS) return response['token']

def get_service_info(input_url, token): return get_response(input_url, {'f':'json', 'token':token})

QUERIES

CREDENTIALS = { 'username': '', 'password': '', 'expiration': '300', 'client': 'referer', 'referer': 'www.arcgis.com', 'f': 'json' }

TOKEN_URL = "https://www.arcgis.com/sharing/rest/generateToken"

ATTACHMENTS = { 'where': '1=1', 'token': '', 'f': 'json', 'returnGeometry':'false' }

REPLICA = { "geometry": '', "geometryType": "esriGeometryEnvelope", "inSR": '', "layerQueries": '', "layers": '0', "replicaName": "read_only_rep", "returnAttachments": 'true', "returnAttachmentsDataByUrl": 'false', "transportType": "esriTransportTypeEmbedded", "async": 'false', "syncModel": "none", "dataFormat": "filegdb", "token": '', "replicaOptions": '', "f": "json" }

UPDATES = { "f": "json", "features": '', "rollbackOnFailure":True }

class App(object): ''' Class with methods to perform tasks with ESRI's REST API ''' def init (self, input_url, token, destination): self.input_url = input_url self.token = token self.destination = destination self.layer_url = None self.layer_id = None self.fs_url = self.check_input_url()

def check_input_url(self):
    url_parts = check_service(self.input_url)
    self.layer_url = url_parts["layer_url"]
    self.layer_id = url_parts["layer_id"]
    if not self.layer_url:
        self.layer_url = add_path(url_parts["fs_url"], "0")
    return url_parts["fs_url"]

def get_root_name(self):
    return time.strftime("%Y_%m_%d_") + get_service_name(self.fs_url)

def replicate(self, query):
    replica_url = add_path(self.fs_url, 'createReplica')
    zip_url = get_response(replica_url, query)['responseUrl']
    zip_file = get_response(zip_url, get_json=False)
    pull_to_local(zip_file, self.get_root_name(), self.destination, 'zip')

def pull_replica(self, query):
    query['token'] = self.token
    layers = get_service_info(self.fs_url, self.token)['layers']
    if self.layer_id:
        query['layers'] = self.layer_id
        self.replicate(query)
    else:
        query['layers'] = [layer['id'] for layer in layers]
        self.replicate(query)

if name == "main":

## Required
TOKEN = login("export_fluggs_mobil", "forstways1")
INPUT_URL = "http://services1.arcgis.com/0cr41EdkajvOA232/ArcGIS/rest/services/Forstmobil/FeatureServer"

## Required for Pull Attachments and Pull Replica
DEST = r"G:\Dvkoord\GIS\TEMP\Tle\Scripts" + "\\" + serviceTodayString

## Required for Update Service
UPDATE_TABLE = "<table to update service>"

## Optional field to label folders by attributes for Pull Attachments
FIELD = ""

## To return attachments in the geodatabase for replicate uncomment the line as follows:
## REPLICA[returnAttachments] = true

RUN = App(INPUT_URL, TOKEN, DEST)
RUN.pull_replica(REPLICA)

— Reply to this email directly or view it on GitHub https://github.com/bgeomapping/arcgis-rest-toolbox/issues/8.

trillevine commented 9 years ago

Hi Brian,

Thanks for getting back to me. Here's the traceback when I comment out my try-catch block and use your original code in get_response:

Message File Name   Line    Position    
Traceback               
    <module>    G:\Dvkoord\GIS\TEMP\Tle\Scripts\Neu\arcgis-rest-toolbox-master\restservices.py  204     
    pull_replica    G:\Dvkoord\GIS\TEMP\Tle\Scripts\Neu\arcgis-rest-toolbox-master\restservices.py  194     
    replicate   G:\Dvkoord\GIS\TEMP\Tle\Scripts\Neu\arcgis-rest-toolbox-master\restservices.py  181     
    get_response    G:\Dvkoord\GIS\TEMP\Tle\Scripts\Neu\arcgis-rest-toolbox-master\restservices.py  86      
    urlopen C:\Python26\ArcGIS10.0\lib\urllib2.py   126     
    open    C:\Python26\ArcGIS10.0\lib\urllib2.py   397     
    http_response   C:\Python26\ArcGIS10.0\lib\urllib2.py   510     
    error   C:\Python26\ArcGIS10.0\lib\urllib2.py   435     
    _call_chain C:\Python26\ArcGIS10.0\lib\urllib2.py   369     
    http_error_default  C:\Python26\ArcGIS10.0\lib\urllib2.py   518     
HTTPError: HTTP Error 500: Internal Server Error                

The reason I believe this is a JSON error can be found in this thread, which I started on Stack Overflow:

http://stackoverflow.com/questions/30391958/rerun-a-python-script-automatically-following-internal-server-error-500

I just tried manually creating a replica via the web interface, and that's not working either...if the script is working for you, then my uess is that there's something up with the server that our stuff is hosted on.

Thanks again...

Trill

b-twice commented 9 years ago

It looks like an error on ArcGIS Online's end. I just ran the script a few times and experienced no issues. I would recommend doing some basic investigation such as:

Let me know what you find out.

On Tue, Jul 14, 2015 at 4:04 AM, trillevine notifications@github.com wrote:

Message File Name Line Position Traceback

G:\Dvkoord\GIS\TEMP\Tle\Scripts\Neu\arcgis-rest-toolbox-master\restservices.py 204 pull_replica G:\Dvkoord\GIS\TEMP\Tle\Scripts\Neu\arcgis-rest-toolbox-master\restservices.py 194 replicate G:\Dvkoord\GIS\TEMP\Tle\Scripts\Neu\arcgis-rest-toolbox-master\restservices.py 181 get_response G:\Dvkoord\GIS\TEMP\Tle\Scripts\Neu\arcgis-rest-toolbox-master\restservices.py 86 urlopen C:\Python26\ArcGIS10.0\lib\urllib2.py 126 open C:\Python26\ArcGIS10.0\lib\urllib2.py 397 http_response C:\Python26\ArcGIS10.0\lib\urllib2.py 510 error C:\Python26\ArcGIS10.0\lib\urllib2.py 435 _call_chain C:\Python26\ArcGIS10.0\lib\urllib2.py 369 http_error_default C:\Python26\ArcGIS10.0\lib\urllib2.py 518 HTTPError: HTTP Error 500: Internal Server Error — Reply to this email directly or view it on GitHub https://github.com/bgeomapping/arcgis-rest-toolbox/issues/8#issuecomment-121159735 .
trillevine commented 9 years ago

Hi Brian,

So i've been going back and forth with esri tech support on this, and it turns out that there's a bug with replicating larger geodb's with attachments. As a workaround, they suggested either setting returnAttachments in REPLICA to false (which isn't really an option for obvious reasons) or setting the async property to true. The issue i'm now having stems from that new async setting: responseUrl is no longer a returned parameter, so I get a key error when running the script. When I go through the rest web interface and set async to true and output to json, it gives me a parameter called statusUrl, which looks like this:

{
  "statusUrl" : "http://services1.arcgis.com/0cr41EdkajvOA232/ArcGIS/rest/services/Forstmobil/FeatureServer/jobs/cbd843bf-a5da-40bb-9e28-cc5e9a14c92e"
}

According to the docs re: asynchronous operations (http://resources.arcgis.com/en/help/arcgis-rest-api/index.html#//02r3000000rt000000), once the replication is complete, it will give me a parameter called resultUrl, which I'm assuming I place in the definition of zip_url, like so:

zip_url = get_response(replica_url, query)['resultUrl']

When I do that, however, I still get a key error, so something's not working correctly. Based on the docs, I assume I have to build in some functionality to check the status of the replication process via statusUrl and then grab the resultUrl parameter when it's done, but I'm not sure how to do that. There's some more information here (http://resources.arcgis.com/en/help/arcgis-rest-api/index.html#/Create_Replica/02r3000000rp000000/), but I can't really make sense of it. Do you have any ideas? Any feedback is greatly appreciated as always. Thanks!

Trill

b-twice commented 9 years ago

Hey Trill,

I'm slammed right now with work/personal life, I will try and take a look and see what we can do to modify the toolbox later this week on Thursday or Friday. Essentially we will need to find the link to the zip file that is provided via status url and then copy that down. Modifications will need to be made to the toolbox to add a checkbox for whether to async or not and then some alterations to route that into the tool methods.

-Brian

On Mon, Aug 24, 2015 at 8:20 AM, trillevine notifications@github.com wrote:

Hi Brian,

So i've been going back and forth with esri tech support on this, and it turns out that there's a bug with replicating larger geodb's with attachments. As a workaround, they suggested either setting returnAttachments in REPLICA to false (which isn't really an option for obvious reasons) or setting the async property to true. The issue i'm now having stems from that new async setting: responseUrl is no longer a returned parameter, so I get a key error when running the script. When I go through the rest web interface and set async to true and output to json, it gives me a parameter called statusUrl, which looks like this:

[image: statusurl] https://cloud.githubusercontent.com/assets/7161139/9439810/8f9b537c-4a6a-11e5-8b92-5b8f97f4d0f2.JPG

So I'm assuming that I need to change the zip_url variable in the replicate def to:

zip_url = get_response(replica_url, query)['statusUrl']

or something along those lines. When I do that, however, I get an empty zip folder, so nothing is being downloaded. I've been combing over the api docs and am at a bit of a loss as to how else I would need to modify to replicate def to get this working with async = true. Do you have any ideas? Any feedback is obviously greatly appreciated. Thanks!

Trill

— Reply to this email directly or view it on GitHub https://github.com/bgeomapping/arcgis-rest-toolbox/issues/8#issuecomment-134170443 .

trillevine commented 9 years ago

Hi Brian,

Sure, whenever you get to it. I'm actually just working with the script (restservices.py), maybe that will be simpler to modify than the toolbox. Thanks for getting back to me.

Trill

b-twice commented 9 years ago

Hey Trill,

I've got a working implementation when I set the query to "Async" = True. See below:

def replicate(self, query):
    replica_url = add_path(self.fs_url, 'createReplica')
    if REPLICA["async"]:
        status_url = get_response(replica_url, query)["statusUrl"]
        query = {'f':'json', 'token':self.token}
        complete = get_response(status_url, query)['status']
        while complete != 'Completed':
            time.sleep(10)
            complete = get_response(status_url, query)['status']
        zip_url = get_response(status_url, query)['resultUrl']
    else:
        zip_url = get_response(replica_url, query)['responseUrl']
    zip_file = get_response(zip_url, get_json=False)
    pull_to_local(zip_file, self.get_root_name(), self.destination,

'zip')

This is a workable solution but no means comprehensive (if the result fails it gets put into an infinite loop). If you look on the async section of Rest API http://resources.arcgis.com/en/help/arcgis-rest-api/index.html#/Asynchronous_operations/02r3000000rt000000/ there is some logic that can be implemented to navigate the various responses on the status. When I get a chance to look through them I will flesh this out and post on github. Hope this works for you in the meantime.

-Brian

On Tue, Aug 25, 2015 at 10:19 AM, trillevine notifications@github.com wrote:

Hi Brian,

Sure, whenever you get to it. I'm actually just working with the script (restservices.py), maybe that will be simpler to modify than the toolbox. Thanks for getting back to me.

Trill

— Reply to this email directly or view it on GitHub https://github.com/bgeomapping/arcgis-rest-toolbox/issues/8#issuecomment-134601532 .

trillevine commented 9 years ago

Hi Brian,

Awesome, thanks for getting to it so fast. I'll check it out tomorrow and let you know how it works...I'll also try to beef it up a bit and let you know what I can add.

Trill

trillevine commented 9 years ago

Hi Brian,

This works for me, thanks again. I'll spend some time pimping this out and forward my changes.

Trill

trillevine commented 9 years ago

Hi Brian,

I added some extra if else clauses in here, just in case it's a really large service and it takes a while for the zip url to reach completed status:

def replicate(self, query):
        replica_url = add_path(self.fs_url, 'createReplica')
        if REPLICA["async"]:
            status_url = get_response(replica_url, query)["statusUrl"]
            query = {'f':'json', 'token':self.token}
            status = get_response(status_url, query)['status']
            if status != 'Completed':
                time.sleep(120)
                status = get_response(status_url, query)['status']
                if status != 'Completed':
                    time.sleep(120)
                    zip_url = get_response(status_url, query)['resultUrl']
                else:
                    zip_url = get_response(status_url, query)['resultUrl']
            else:
                zip_url = get_response(status_url, query)['resultUrl']
        else:
            zip_url = get_response(replica_url, query)['responseUrl']
        zip_file = get_response(zip_url, get_json=False)
        pull_to_local(zip_file, self.get_root_name(), self.destination, 'zip')

I'm not sure how to prevent the infinite loop part, though.

b-twice commented 9 years ago

I'll take a look sometime this week. The while loop will work, it just needs to be triggered off a change in state, i.e. the response no longer pending, etc and then some logic to work around this various messages (complete, failed and so forth). Pinging for a response every few minutes is not too taxing for anyone and could put in a trigger to timeout after some amount of time, although you have to consider that some of these services could be serious in size with attachments.

trillevine commented 9 years ago

As I understand it, dealing with changes in state requires working with the threading module. I played around with it, but couldn't get it working....but yeah, having the script respond to a change in state from processing to completed would be ideal.