cityofaustin / knackpy

A Python client for interacting with Knack applications
https://cityofaustin.github.io/knackpy/docs/user-guide/
Other
39 stars 17 forks source link

Size constraints on knackpy.get() requests? #100

Closed brandontoups closed 2 years ago

brandontoups commented 2 years ago

Howdy

I'm seeing some issues with both the old and new version of the knackpy .get functionality erroring out. Both of these fetches will sometimes work randomly (though 99% of the time they don't).

We can't tell if it's an issue with pagination, improper responses from Knack, or something else.

I've confirmed using Knack's native API that the knackpy GETs are attempting a fetch of 143 pages and 3561 total records for this fetch, with an estimated (total) size of 21MB. For an API fetch this doesn't seem substantially large enough to warrant the errors we're seeing.

For repro, I was having trouble tracking down an exact setup that would trigger this every time. It wasn't a specific subset of records, as it would happen intermittently. It seemed like the number of record's returned was most likely the culprit, but again, we've had success with larger than 4k records plenty of times before. This feels like it's somewhat new, and I don't think response errors should be bubbling up to the native python libraries.

Sorry if this is actually an issue upstream of knackpy.

System

Issue/Repro for knackpy==1.0.20

The following code

# !pip install knackpy==1.0.20
import knackpy

filters = {
    'match': 'and',
    'rules': [
        {
            'field':'field_75',
            'operator':'is after',
            'value':'10/14/2021'
        },
        {
            'field':'field_75',
            'operator':'is before',
            'value':'12/14/2021'
        }
    ]
}

app = knackpy.App(app_id='' ,api_key='')
kn = app.get('object_7',filters=filters)
kn

produced the following error

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
/var/folders/pr/wdvw7x0s31d94g3qhzd732jh0000gn/T/ipykernel_56667/467188259.py in <module>
     19 
     20 app = knackpy.App(app_id='' ,api_key='')
---> 21 kn = app.get('object_7',filters=filters)
     22 
     23 # kn = Knack(

/usr/local/lib/python3.9/site-packages/knackpy/app.py in get(self, identifier, refresh, record_limit, filters, generate)
    231 
    232         if not self.data.get(container_key) or refresh:
--> 233             self.data[container_key] = api.get(
    234                 app_id=self.app_id,
    235                 api_key=self.api_key,

/usr/local/lib/python3.9/site-packages/knackpy/api.py in get(app_id, api_key, slug, obj, scene, view, record_limit, filters, max_attempts, timeout)
    233         MAX_ROWS_PER_PAGE if record_limit >= MAX_ROWS_PER_PAGE else record_limit
    234     )
--> 235     return _get_paginated_records(
    236         app_id=app_id,
    237         api_key=api_key,

/usr/local/lib/python3.9/site-packages/knackpy/api.py in _get_paginated_records(app_id, url, max_attempts, record_limit, rows_per_page, api_key, timeout, filters)
    178         )
    179 
--> 180         fetched_records = res.json()["records"]
    181         if len(fetched_records) == 0:
    182             """Failsafe to handle edge case in which Knack returns fewer records than expected from 

/usr/local/lib/python3.9/site-packages/requests/models.py in json(self, **kwargs)
    908                     # used.
    909                     pass
--> 910         return complexjson.loads(self.text, **kwargs)
    911 
    912     @property

/usr/local/Cellar/python@3.9/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    344             parse_int is None and parse_float is None and
    345             parse_constant is None and object_pairs_hook is None and not kw):
--> 346         return _default_decoder.decode(s)
    347     if cls is None:
    348         cls = JSONDecoder

/usr/local/Cellar/python@3.9/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py in decode(self, s, _w)
    335 
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

/usr/local/Cellar/python@3.9/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py in raw_decode(self, s, idx)
    351         """
    352         try:
--> 353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
    355             raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Unterminated string starting at: line 1 column 625726 (char 625725)

With the same code I've also gotten different errors akin to

  File "/usr/local/lib/python3.9/site-packages/knackpy/api.py", line 235, in get
    return _get_paginated_records(
  File "/usr/local/lib/python3.9/site-packages/knackpy/api.py", line 180, in _get_paginated_records
    fetched_records = res.json()["records"]        
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 910, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/Cellar/python@3.9/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/local/Cellar/python@3.9/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/python@3.9/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ':' delimiter: line 1 column 858400 (char 858399)

Issue/Repro for knackpy==0.1.1

Code:

# !pip install knackpy==0.1.1
from knackpy import Knack

filters = {
    'match': 'and',
    'rules': [
        {
            'field':'field_75',
            'operator':'is after',
            'value':'10/14/2021'
        },
        {
            'field':'field_75',
            'operator':'is before',
            'value':'12/14/2021'
        }
    ]
}

kn = Knack(
    obj='object_7',
    app_id='',
    api_key='',
    timeout = 360,
    filters=filters
)
kn 

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/urllib3/response.py in _update_chunk_length(self)
    696         try:
--> 697             self.chunk_left = int(line, 16)
    698         except ValueError:

ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

InvalidChunkLength                        Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/urllib3/response.py in _error_catcher(self)
    437             try:
--> 438                 yield
    439 

/usr/local/lib/python3.9/site-packages/urllib3/response.py in read_chunked(self, amt, decode_content)
    763             while True:
--> 764                 self._update_chunk_length()
    765                 if self.chunk_left == 0:

/usr/local/lib/python3.9/site-packages/urllib3/response.py in _update_chunk_length(self)
    700             self.close()
--> 701             raise InvalidChunkLength(self, line)
    702 

InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)

During handling of the above exception, another exception occurred:

ProtocolError                             Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/requests/models.py in generate()
    757                 try:
--> 758                     for chunk in self.raw.stream(chunk_size, decode_content=True):
    759                         yield chunk

/usr/local/lib/python3.9/site-packages/urllib3/response.py in stream(self, amt, decode_content)
    571         if self.chunked and self.supports_chunked_reads():
--> 572             for line in self.read_chunked(amt, decode_content=decode_content):
    573                 yield line

/usr/local/lib/python3.9/site-packages/urllib3/response.py in read_chunked(self, amt, decode_content)
    792             if self._original_response:
--> 793                 self._original_response.close()
    794 

/usr/local/Cellar/python@3.9/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/contextlib.py in __exit__(self, typ, value, traceback)
    136             try:
--> 137                 self.gen.throw(typ, value, traceback)
    138             except StopIteration as exc:

/usr/local/lib/python3.9/site-packages/urllib3/response.py in _error_catcher(self)
    454                 # This includes IncompleteRead.
--> 455                 raise ProtocolError("Connection broken: %r" % e, e)
    456 

ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

During handling of the above exception, another exception occurred:

ChunkedEncodingError                      Traceback (most recent call last)
/var/folders/pr/wdvw7x0s31d94g3qhzd732jh0000gn/T/ipykernel_57438/297416796.py in <module>
     18 }
     19 
---> 20 kn = Knack(
     21     obj='object_7',
     22     app_id='',

/usr/local/lib/python3.9/site-packages/knackpy/knackpy.py in __init__(self, api_key, app_id, filters, include_ids, id_key, max_attempts, obj, page_limit, raw_connections, rows_per_page, ref_obj, scene, timeout, tzinfo, view)
    133 
    134         self.endpoint = self._get_endpoint()
--> 135         self.data_raw = self._get_data(self.endpoint, "records", self.filters)
    136 
    137         if not self.data_raw:

/usr/local/lib/python3.9/site-packages/knackpy/knackpy.py in _get_data(self, endpoint, record_type, filters)
    205 
    206                 try:
--> 207                     req = requests.get(
    208                         endpoint, headers=headers, params=params, timeout=self.timeout
    209                     )

/usr/local/lib/python3.9/site-packages/requests/api.py in get(url, params, **kwargs)
     73     """
     74 
---> 75     return request('get', url, params=params, **kwargs)
     76 
     77 

/usr/local/lib/python3.9/site-packages/requests/api.py in request(method, url, **kwargs)
     59     # cases, and look like a memory leak in others.
     60     with sessions.Session() as session:
---> 61         return session.request(method=method, url=url, **kwargs)
     62 
     63 

/usr/local/lib/python3.9/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    540         }
    541         send_kwargs.update(settings)
--> 542         resp = self.send(prep, **send_kwargs)
    543 
    544         return resp

/usr/local/lib/python3.9/site-packages/requests/sessions.py in send(self, request, **kwargs)
    695 
    696         if not stream:
--> 697             r.content
    698 
    699         return r

/usr/local/lib/python3.9/site-packages/requests/models.py in content(self)
    834                 self._content = None
    835             else:
--> 836                 self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
    837 
    838         self._content_consumed = True

/usr/local/lib/python3.9/site-packages/requests/models.py in generate()
    759                         yield chunk
    760                 except ProtocolError as e:
--> 761                     raise ChunkedEncodingError(e)
    762                 except DecodeError as e:
    763                     raise ContentDecodingError(e)

ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
verifiedathletics commented 2 years ago

I am getting the same error all of a sudden. It seems the res.text is getting cut off and not pulling the full string.

I am guessing this is due to some sort of upstream change with knack

brandontoups commented 2 years ago

I did confirm that calling the native API will produce this error, so it does look like upstream error. Sorry for raising the issue here. We'll reach out to Knack.

import requests
import json

url = "https://api.knack.com/v1/objects/object_7/records?filters=[{\"field\":\"field_75\", \"operator\":\"is after\",\"value\":\"10/14/2021\"},{\"field\":\"field_75\",\"operator\":\"is before\",\"value\":\"12/14/2021\"}]&rows_per_page=1000&page=" + str(page)

payload = ""
headers = {
  'X-Knack-Application-Id': 'REDACTED',
  'X-Knack-REST-API-Key': 'REDACTED',
  'Content-Type': 'application/json',
  'Cookie': 'WITHDRAWN'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

with expected total_records of ~3500. I'm using the rows_per_page of 1000 which matches what knackpy is using, and matches the documentation here image

Sorry for bringing this up in your issues and not with Knack directly

johnclary commented 2 years ago

@brandontoups sorry for the slow reply. glad to hear this isn't an Knackpy issue! closing.

johnclary commented 2 years ago

@brandontoups if you do hear from Knack i'd love to understand what's happening here. we have occasional ETLs that pull tens of thousands of records without issue. we only test knackpy against an enterprise plan but i would be surprised if that was relataed.

verifiedathletics commented 2 years ago

I'm not sure what the issue with the api was but they resolved it. I went into the code to reduce the number of rows per pull while it was a problem but I returned it to 1000 and it is working

johnclary commented 2 years ago

👍 thanks!

johnclary commented 2 years ago

@verifiedathletics @brandontoups if y'all haven't already, please ⭐ our repo if it's gettin the job done for you. it helps us build internal support for the project.

brandontoups commented 2 years ago

@johnclary asked about it but seemed to start working about a week later so never followed up. Sorry for never following up here