Open mikewyer opened 3 years ago
Start is deprecated as a parameter - see here - https://developer.atlassian.com/cloud/confluence/change-notice-moderize-search-rest-apis/
It's been replaced by a cursor parameter which will define the 'next' page of results after the ones you have received.
The cursor param is included in a URL which is returned within the JSON of any paginated request, and can be supplied back to the API to pick up where you left off.
See my monkeypatched method below, which will add a 'cursor' value to every dict returned. The cursor can then be supplied back to get_all_pages_by_label to pick up the following page.
I am probably going to further monkeypatch to loop iteratively to get all values within the function, but others may not like that.
Note - the code here is awkward and very much slapped together. It can be done more elegantly.
https://github.com/atlassian-api/atlassian-python-api/issues/796 - This is a similar issue solved somewhat more elegantly.
def get_all_pages_by_label(self, label, start=0, limit=50, cursor=0):
"""
Get all page by label
:param label:
:param start: OPTIONAL: The start point of the collection to return. Default: None (0).
:param limit: OPTIONAL: The limit of the number of pages to return, this may be restricted by
fixed system limits. Default: 50
:return:
"""
url = "rest/api/content/search"
params = {}
if label:
params["cql"] = 'type={type} AND label="{label}"'.format(type="page", label=label)
if start:
params["start"] = start
if limit:
params["limit"] = limit
if cursor:
params["cursor"] = cursor
try:
print()
response = self.get(url, params=params)
except HTTPError as e:
if e.response.status_code == 400:
raise ApiValueError("The CQL is invalid or missing", reason=e)
raise
text = response['_links']['next']
cursor_loc= text.find('cursor') + 7
cursor_string = text[cursor_loc:text[cursor_loc:].find('&')+cursor_loc]
results_response = response.get("results")
for item in results_response: item['next_cursor']= cursor_string
return response.get("results")
In the BB modules I've introduced a function:
https://github.com/atlassian-api/atlassian-python-api/blob/6cd6dd047b09fad1f43f4f2d42a485c96b7d1e84/atlassian/bitbucket/cloud/base.py#L35-L37
https://github.com/atlassian-api/atlassian-python-api/blob/6cd6dd047b09fad1f43f4f2d42a485c96b7d1e84/atlassian/bitbucket/server/base.py#L20
This function is used in the paged APIs and returns a generator to iterate throug the result.
Maybe it's possible to merge this functions and add it to rest_clipent.py
. The problem is to implement a generic way to get the next url for every atlassion product.
Going to spend a bit more time on it today and see if there's a generic solution for all stuff confluence side and compare to what you've done for BB.
Edit: Looks like the response is not the same for BB as it is for confluence - there's no pages param here. Going to experiment and see where cursor applies and see if we can make it more generic, or whether it is going to require edits to every method
For now - a cleaner monkey patch.
def get_all_pages_by_label(self, label, start=0, limit=50):
"""
Get all page by label
:param label:
:param start: OPTIONAL: The start point of the collection to return. Default: None (0).
:param limit: OPTIONAL: The limit of the number of pages to return, this may be restricted by
fixed system limits. Default: 50
:return:
"""
url = "rest/api/content/search"
params = {}
if label:
params["cql"] = 'type={type} AND label="{label}"'.format(type="page", label=label)
if start:
params["start"] = start
if limit:
params["limit"] = limit
try:
response = self.get(url, params=params)
except HTTPError as e:
if e.response.status_code == 400:
raise ApiValueError("The CQL is invalid or missing", reason=e)
raise
while response.get('_links').get('next'):
response_results = response.get('results')
params['cursor'] = response.get('_links').get('next')
response = self.get(response.get('_links').get('next'), params=params)
response_results = response_results + response.get('results')
if response_results:
results = response_results
else:
results = response.get('results')
return results
While a bit hacky, my temporary solution to this is to just set limit really, really high. In my case, I'm using labels to summarize pages with a label in a "page properties report." Confluence only supports 3,000 pages in that report, so I just set that as the limit:
pages_with_label = confluence.get_all_pages_by_label(label, limit=3000)
Note: I only tested this up to 75 pages, so maybe there's a limit for a single call, but I didn't see such a limit documented anywhere.
There definitely is a limit - I forget what it is but it is higher than 75 but lower than 3000 :)
How is this not fixed? The start parameter has been deprecated for 3 years!
To reproduce: Attach a label to several pages. In our case, 54 pages.
Whatever the setting for the
start
parameter, the function always returns the firstlimit
number of results.Our workaround is to keep increasing the
limit
iflen(batch) == limit
until all the results fit in a single batch. This won't work for labels on hundreds of pages.