Orange-OpenSource / hurl

Hurl, run and test HTTP requests with plain text.
https://hurl.dev
Apache License 2.0
12.65k stars 476 forks source link

Automatic pagination that continues until exhaustion #2908

Open joshuaclayton opened 2 months ago

joshuaclayton commented 2 months ago

Problem to solve

Chained requests require explicit declaration, which makes pagination through an unknown page size untenable.

POST https://URL
Content-Type: application/json
[BasicAuth]
user: pass
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
next_page: header "Link" regex "<([^>]+)>; rel=\"next\""

POST {{next_page}}
[BasicAuth]
user: pass
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
next_page: header "Link" regex "<([^>]+)>; rel=\"next\""

# keep repeating somehow - programmatically generate hurl files? Continue with copy/paste?

Proposal

The simplest example would be to wholesale swap URLs based on presence of a capture without any additional modification. This would allow for simple asserts / body capture decoupled from raw values and instead based on structure (e.g. presence of a field in a JSON response). Asserting against raw values likely wouldn't make sense for anything dynamic given generic pagination.

In that case, an additional section might work:

[PaginatesVia]
url: header "Link" regex "<([^>]+)>; rel=\"next\""

Other approaches might include more specific data capture (e.g. parsing page=5 from the Link header for the correct page, or querying the JSON response if that's where pagination info sits).

Additional context and resources

Specific use case: data extraction (rather than response assertion) against paginated resources of unknown size.

I'd looked to see if there was any functionality around looping within the grammar and didn't find anything, and while I understand it may be possible to use JSON output + shell + jq or similar to initiate chaining, in an ideal world there'd be a mechanism for this within the grammar itself.

fabricereix commented 2 months ago

Thanks @joshuaclayton for your issue. Automatic pagination is really an interesting/challenging use case. It would be nice if it could fit in a more general looping mechanism not specific to pagination. We have already skip, we might also add a repeat with a specifc repetition or a termination condition (similar to retry).

We need plenty of examples to see how it could work.

jcamiel commented 2 months ago

With --skip and --repeat, one can image such a file:

POST {{url}}
[Options]
repeat: -1 # infinite loop
skip: {{url}} isNull
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
url: header "Link" regex "<([^>]+)>; rel=\"next\""

We initiate the variable url with initial value, play the request if this variable is not null, update the variable url and repeat. The thing that is missing is when the capture for the variable url is failing, Hurl considers it as an error whereas we want to continue the run. We could imagine in this case to give a default value to the capture if it is failing url: header "Link" regex "<([^>]+)>; rel=\"next\"" default null

POST {{url}}
[Options]
repeat: -1 # infinite loop
skip: {{url}} isNull
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
url: header "Link" regex "<([^>]+)>; rel=\"next\"" default null

In summary, we could use repeat and skipwithout too much syntax changes:

jcamiel commented 2 months ago

Another, better, syntax for default could be else:

POST {{url}}
[Options]
repeat: -1 # infinite loop
skip: {{url}} isNull
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
url: header "Link" regex "<([^>]+)>; rel=\"next\"" else null
lepapareil commented 2 months ago

One possible solution is to use repeat feature, which has been developed by @jcamiel and will be available in the next release.

For example, using Gitlab api to retrieve tags list from a repo, all we have to do is creating pagination.hurl :

GET {{gitlab_api_url}}/projects/{{gitlab_project_id}}/repository/tags?private_token={{gitlab_token}}&per_page={{per_page}}&page=1
Content-Type: application/json

HTTP 200

[Captures]
total_pages: header "X-Total-Pages" toInt
GET {{gitlab_api_url}}/projects/{{gitlab_project_id}}/repository/tags?private_token={{gitlab_token}}&sort=desc&order_by=version&per_page={{per_page}}&page={{next_page}}
Content-Type: application/json
[Options]
repeat: {{total_pages}}

HTTP 200

[Captures]
next_page: header "X-Next-Page"
$ hurl \
    --variable gitlab_api_url=https://gitlab.com/api/v4 \
    --variable gitlab_project_id=1 \
    --variable gitlab_token=***** \
    --variable per_page=1 \
    --variable next_page=1 \
    pagination.hurl