Automatic pagination that continues until exhaustion

joshuaclayton commented 2 months ago

Problem to solve

Chained requests require explicit declaration, which makes pagination through an unknown page size untenable.

POST https://URL
Content-Type: application/json
[BasicAuth]
user: pass
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
next_page: header "Link" regex "<([^>]+)>; rel=\"next\""

POST {{next_page}}
[BasicAuth]
user: pass
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
next_page: header "Link" regex "<([^>]+)>; rel=\"next\""

# keep repeating somehow - programmatically generate hurl files? Continue with copy/paste?

Proposal

The simplest example would be to wholesale swap URLs based on presence of a capture without any additional modification. This would allow for simple asserts / body capture decoupled from raw values and instead based on structure (e.g. presence of a field in a JSON response). Asserting against raw values likely wouldn't make sense for anything dynamic given generic pagination.

In that case, an additional section might work:

[PaginatesVia]
url: header "Link" regex "<([^>]+)>; rel=\"next\""

Other approaches might include more specific data capture (e.g. parsing page=5 from the Link header for the correct page, or querying the JSON response if that's where pagination info sits).

Additional context and resources

Specific use case: data extraction (rather than response assertion) against paginated resources of unknown size.

I'd looked to see if there was any functionality around looping within the grammar and didn't find anything, and while I understand it may be possible to use JSON output + shell + jq or similar to initiate chaining, in an ideal world there'd be a mechanism for this within the grammar itself.

fabricereix commented 2 months ago

Thanks @joshuaclayton for your issue. Automatic pagination is really an interesting/challenging use case. It would be nice if it could fit in a more general looping mechanism not specific to pagination. We have already skip, we might also add a repeat with a specifc repetition or a termination condition (similar to retry).

We need plenty of examples to see how it could work.

jcamiel commented 2 months ago

With --skip and --repeat, one can image such a file:

POST {{url}}
[Options]
repeat: -1 # infinite loop
skip: {{url}} isNull
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
url: header "Link" regex "<([^>]+)>; rel=\"next\""

We initiate the variable url with initial value, play the request if this variable is not null, update the variable url and repeat. The thing that is missing is when the capture for the variable url is failing, Hurl considers it as an error whereas we want to continue the run. We could imagine in this case to give a default value to the capture if it is failing url: header "Link" regex "<([^>]+)>; rel=\"next\"" default null

POST {{url}}
[Options]
repeat: -1 # infinite loop
skip: {{url}} isNull
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
url: header "Link" regex "<([^>]+)>; rel=\"next\"" default null

In summary, we could use repeat and skipwithout too much syntax changes:

accept a predicate in skip
find a way to make "faillible" capture: with a default value for instance

jcamiel commented 2 months ago

Another, better, syntax for default could be else:

POST {{url}}
[Options]
repeat: -1 # infinite loop
skip: {{url}} isNull
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
url: header "Link" regex "<([^>]+)>; rel=\"next\"" else null

lepapareil commented 2 months ago

One possible solution is to use repeat feature, which has been developed by @jcamiel and will be available in the next release.

For example, using Gitlab api to retrieve tags list from a repo, all we have to do is creating pagination.hurl :

Make a first request section to get total pages :

GET {{gitlab_api_url}}/projects/{{gitlab_project_id}}/repository/tags?private_token={{gitlab_token}}&per_page={{per_page}}&page=1
Content-Type: application/json

HTTP 200

[Captures]
total_pages: header "X-Total-Pages" toInt

Then iterate wit repeat catching next page from each response :

GET {{gitlab_api_url}}/projects/{{gitlab_project_id}}/repository/tags?private_token={{gitlab_token}}&sort=desc&order_by=version&per_page={{per_page}}&page={{next_page}}
Content-Type: application/json
[Options]
repeat: {{total_pages}}

HTTP 200

[Captures]
next_page: header "X-Next-Page"

And simply exec hurl and set init vars:

$ hurl \
    --variable gitlab_api_url=https://gitlab.com/api/v4 \
    --variable gitlab_project_id=1 \
    --variable gitlab_token=***** \
    --variable per_page=1 \
    --variable next_page=1 \
    pagination.hurl

Orange-OpenSource / hurl