danielgtaylor / restish

Restish is a CLI for interacting with REST-ish HTTP APIs with some nice features built-in
https://rest.sh/
MIT License
930 stars 76 forks source link

add support for pagination via URL params such as `page` + `count` #267

Open 0xdevalias opened 2 days ago

0xdevalias commented 2 days ago

It would be cool if restish supported automatic pagination strategies for URL params such as page + count (or rows, etc) alongside its existing hypermedia pagination features:

I've been looking for a good tool that can handle this, similar to how the GitHub CLI implements pagination for its gh api command; but more generically applicable to any API:

I'm not sure what the most common form of these params are, but I think figuring that out and using it as the default would probably be ideal, and then allowing each 'type' of parameter to be optionally configured for flexibility.

eg. Lets say the defaults are page and count, and we call restish with a URL like this, which will return page 1, with up to 100 transactions on it:

restish get 'https://api.example.com/api/v1/txs?foopage=1&barcount=100'

The URL doesn't include any params that match the known pagination param name defaults, so no pagination would occur.

However if we provided overrides to tell it what the param names are:

restish get \
  --paginate-param-page='foopage' \
  --paginate-param-count='barcount' \
  https://api.example.com/api/v1/txs?foopage=1&barcount=100'

It could then automatically handle pagination; though since there is nothing that tells it when it should stop, some kind of default 'stop case' should be provided.

We could extend this further to include a default param in the JSON response body for the totalCount of records returned, and another for the key of the 'container' array that holds them (eg. records). All of these would be able to be overridden as well.

Given those defaults, and this example api providing txNums, txArray, and no total, we could call it something like this:

restish get \
  --paginate-param-page='foopage' \
  --paginate-param-count='barcount' \
  --paginate-result-total-count='txNum' \
  --paginate-result-array='txArray' \
  https://api.example.com/api/v1/txs?foopage=1&barcount=100'

It would know to look at txNum and txArray due to the overrides, and it would know to stop trying to fetch more once we had fetched the 'total count' of records.

There could also be another 'stop strategy' for APIs that don't include the total count; which could be as simple as stopping if the 'container array' (eg. records) was empty.

It could even be simplified further, so that if the --paginate-param-* args are defined, there would be no need to include them in the URL itself:

restish get \
  --paginate-param-page='foopage' \
  --paginate-param-count='barcount' \
  --paginate-result-total-count='txNum' \
  --paginate-result-array='txArray' \
  https://api.example.com/api/v1/txs'
0xdevalias commented 2 days ago

If anyone is interested, I ended up implementing a prototype of this as a wrapper script:

Source ```shell #!/usr/bin/env zsh # Default values for pagination parameters DEFAULT_PAGE_PARAM="page" DEFAULT_COUNT_PARAM="count" DEFAULT_PAGE_SIZE=100 DEFAULT_TOTAL_COUNT_KEY=".data.total" DEFAULT_ARRAY_KEY=".data.items" SLURP=false HTTP_CLIENT="curl" # Default HTTP client DEBUG=false # Debug mode off by default # Function to prefix keys with a dot if they don't already start with one function prefix_with_dot() { local key="$1" if [[ "$key" != .* && -n "$key" ]]; then echo ".$key" else echo "$key" fi } # Function to clean and construct the URL function clean_and_add_params() { local full_url="$1" local page_param="$2" local count_param="$3" local page_value="$4" local count_value="$5" # Separate base URL and query parameters local base_url="${full_url%%\?*}" # Extract everything before the '?' local query_params="${full_url#*\?}" # Extract everything after the '?' # If there's no '?' in the URL, query_params is the same as full_url, so reset [[ "$full_url" == "$base_url" ]] && query_params="" # Remove existing pagination params from query_params query_params=$(echo "$query_params" | sed -E "s/(^|&)$page_param=[^&]*//g" | sed -E "s/(^|&)$count_param=[^&]*//g") # Append new pagination parameters query_params="${query_params}&${page_param}=${page_value}&${count_param}=${count_value}" # Clean up query_params to remove any leading/trailing '&' or '?' query_params=$(echo "$query_params" | sed -E 's/^&//; s/&$//') # Reconstruct the full URL if [[ -z "$query_params" ]]; then echo "$base_url" else echo "$base_url?$query_params" fi } function print_help() { cat < Options: --page-param= Name of the "page" parameter (default: '${DEFAULT_PAGE_PARAM}') --count-param= Name of the "count" parameter (default: '${DEFAULT_COUNT_PARAM}') --total-key= Key for total count in the response JSON (supports nested keys with jq dot syntax; default: '${DEFAULT_TOTAL_COUNT_KEY}') --array-key= Key for the records array in the response JSON (supports nested keys with jq dot syntax; default: '${DEFAULT_ARRAY_KEY}') --slurp Combine all pages into a single JSON array --client= HTTP client to use (curl or restish; default: '${HTTP_CLIENT}') --debug Show raw server responses --help, -h Display this help message Examples: paginate-fetch \\ --page-param='foopage' \\ --count-param='barcount' \\ --total-key='data.totalCount' \\ --array-key='data.records' \\ 'https://api.example.com/api/foo' EOF } # Parse arguments while [[ "$#" -gt 0 ]]; do case "$1" in --page-param=*) PAGE_PARAM="${1#*=}" ;; --count-param=*) COUNT_PARAM="${1#*=}" ;; --total-key=*) TOTAL_COUNT_KEY="${1#*=}" ;; --array-key=*) ARRAY_KEY="${1#*=}" ;; --slurp) SLURP=true ;; --client=*) HTTP_CLIENT="${1#*=}" ;; --debug) DEBUG=true ;; --help|-h) print_help; exit 0 ;; *) URL="$1" ;; esac shift done # Set defaults if not provided PAGE_PARAM="${PAGE_PARAM:-$DEFAULT_PAGE_PARAM}" COUNT_PARAM="${COUNT_PARAM:-$DEFAULT_COUNT_PARAM}" PAGE_SIZE="${PAGE_SIZE:-$DEFAULT_PAGE_SIZE}" TOTAL_COUNT_KEY=$(prefix_with_dot "${TOTAL_COUNT_KEY:-$DEFAULT_TOTAL_COUNT_KEY}") ARRAY_KEY=$(prefix_with_dot "${ARRAY_KEY:-$DEFAULT_ARRAY_KEY}") if [[ -z "$URL" ]]; then echo "Error: URL is required." >&2 print_help >&2 exit 1 fi # Variables for pagination current_page=1 total_count=-1 fetched_records=0 merged_output="[]" # Start with an empty JSON array response_combined=() # Function to make an HTTP request using the selected client function fetch_page() { local url="$1" case "$HTTP_CLIENT" in curl) curl -s "$url" ;; restish) restish get "$url" 2>/dev/null ;; *) echo "Error: Unsupported HTTP client '$HTTP_CLIENT'." >&2 exit 1 ;; esac } # Function to parse JSON using jq function parse_json() { local json="$1" local jq_filter="$2" echo "$json" | jq -c "$jq_filter" } # Loop through pages while true; do # Build URL with cleaned pagination params paginated_url=$(clean_and_add_params "$URL" "$PAGE_PARAM" "$COUNT_PARAM" "$current_page" "$PAGE_SIZE") # Fetch the current page response=$(fetch_page "$paginated_url") if [[ -z "$response" ]]; then echo "Error: No response from server." >&2 break fi # Show raw response if debugging if [[ "$DEBUG" == true ]]; then echo "DEBUG: Raw response from ${paginated_url}:" >&2 echo "$response" >&2 fi # Extract the total count and records array using jq filters total_count=$(parse_json "$response" "$TOTAL_COUNT_KEY" 2>/dev/null) records=$(parse_json "$response" "$ARRAY_KEY" 2>/dev/null) # Check for empty array or invalid response if [[ -z "$records" || "$records" == "null" ]]; then echo "Pagination ended: Empty response array." >&2 break fi # Merge records if not slurping if [[ "$SLURP" == true ]]; then response_combined+=("$records") else merged_output=$(echo "$merged_output $records" | jq -s 'add') fi # Update fetched records count fetched_records=$((fetched_records + $(echo "$records" | jq length))) # Check stop condition based on total count if [[ "$total_count" -ge 0 && "$fetched_records" -ge "$total_count" ]]; then echo "Pagination ended: Reached total count ($total_count)." >&2 break fi # Increment the page current_page=$((current_page + 1)) done # Output results if [[ "$SLURP" == true ]]; then echo "["$(IFS=,; echo "${response_combined[*]}")"]" | jq else echo "$merged_output" | jq fi ```

Example usage:

⇒ paginate-fetch --client='restish' --count-param='rows' --array-key='.txArray' --total-key='.txNums' 'https://api.example.com/api/v1/txs?page=1&rows=100' > out.json

Pagination ended: Reached total count (108).

⇒ jq 'length' out.json
108