PagerDuty / pdpyras

Low-level PagerDuty REST/Events API client for Python
MIT License
129 stars 29 forks source link

HTTP 400 when hitting "log_entries" endpoint #37

Closed gtangthousandeyes closed 3 years ago

gtangthousandeyes commented 3 years ago

I just noticed this error when trying to use your library to retrieve my organization's log entries

>>> data_list = session.list_all('log_entries')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/gtang/.pyenv/versions/etl_env/lib/python3.6/site-packages/pdpyras.py", line 1074, in list_all
    return list(self.iter_all(path, **kw))
  File "/Users/gtang/.pyenv/versions/etl_env/lib/python3.6/site-packages/pdpyras.py", line 999, in iter_all
    r.status_code, path), response=r)
pdpyras.PDClientError: Encountered HTTP error status (400) response while iterating through index endpoint log_entries.

Not sure if the error is coming from PagerDuty's API or your wrapper but I figure I post this issue here

Deconstrained commented 3 years ago

Hi! Terribly sorry for the late response.

A status 400 would be returned from an index endpoint if the limit parameter exceeds 10,000. That is a possible cause in this case, and as for the client it's doing what it was designed to do although this is not a good design and needs improvement. If this is what happened when list_all was invoked, then that would mean at least a hundred GET requests to the API went to waste because the last one in the series failed 😱

A strategy for avoiding this undesirable failure scenario that I shall implement is imposing a hard iteration limit for the *_all methods, and warning when the limit is reached, as well as a configurable soft maximum (to avoid tying up the client / the API for too long, and/or saturating memory with results).

This issue most often happens with historical records like log entries and incidents (that can easily accumulate > 10k records in a typical account over time). To circumvent this and get records over all time, we at PagerDuty recommend applying a time range filter (since and until parameters) to fetch a time period's worth of records at a time, and then repeating the process with a different time range until the full history of your account is covered.

Addendum: if dealing with really big sets of historical data, it's recommended to download and work with a separate copy of them. Historical data like Incident Log Entries do not change; they are only appended to. Hence, it's far less efficient and more time consuming to download the data all over again through the API versus having a local copy on which to iterate through and perform reports. I recommend exporting to JSON files each containing a list of 1000 or so records so that they don't get huge but so that there are not so many of them that stat/list files on the directory takes a long time.

Deconstrained commented 3 years ago

Resolved in version 4.1.2; iteration in big indexes like incidents and log_entries will not fail once it reaches the maximum but will return whatever it gets and print a warning message with an explanation to the log.