Open delenamalan opened 2 years ago
@delenamalan you should be able to work around this issue by moving the spi/v3/
part of the base_url to the streams' path
@girarda I've just ran into a very similar issue for which your proposed solution is unfortunately not working. I'm using the low-code SDK (airbyte-cdk==0.53.0
).
Here's the relevant part of my manifest.yaml
:
cursor_paginator:
type: DefaultPaginator
pagination_strategy:
type: CursorPagination
cursor_value: "{{ response._links.next }}"
page_token_option:
type: "RequestPath"
base_retriever:
type: SimpleRetriever
paginator:
$ref: "#/definitions/cursor_paginator"
record_selector:
$ref: "#/definitions/selector"
requester:
type: HttpRequester
url_base: https://{{ config['domain_name'] }}
path: "{{ parameters.path }}"
http_method: GET
authenticator:
type: BasicHttpAuthenticator
username: "{{ config['email'] }}"
password: "{{ config['api_token'] }}"
request_body_json: {}
request_headers: {}
base_stream:
type: DeclarativeStream
schema_loader:
$ref: "#/definitions/schema_loader"
search_stream:
$ref: "#/definitions/base_stream"
retriever:
$ref: "#/definitions/base_retriever"
requester:
$ref: "#/definitions/requester"
request_parameters:
cql: space={{ config['space_id'] }} and type=page
expand: '["history.lastUpdated"]'
primary_key: "id"
$parameters:
name: "search"
path: /wiki/rest/api/content/search
In concrete terms, this means that the initial request is posted to:
https://domain.service.net/wiki/rest/api/content/search?cql=space%3DFOO+and+type%3Dpage&expand=history.lastUpdated&after=1699493398'
Calling the self._next_page_token
method in the CDK's simple_retriever.py
outputs the following next page token:
{'next_page_token': '/rest/api/content/search?next=true&cursor=BAR&expand=history.lastUpdated&limit=25&start=25&after=1699493398&cql=space%3DFOO+and+type%3Dpage'}
However, this then leads to the next request:
https://domain.service.net/rest/api/content/search?next=true&cursor=BAR&expand=history.lastUpdated&limit=25&start=25&after=1699493398&cql=space%3DFOO+and+type%3Dpage&expand=history.lastUpdated
This is missing the leading /wiki
in the path, which results in a 404. I've tried shifting the /wiki/rest/api/content
between the base_url
and the parameters.path
but can't get anything to work. Do you have any advice here?
Thanks in advance :slightly_smiling_face:
Having dug a bit deeper, it looks like the call below is the issue (see source):
url = urljoin(self.get_url_base(), path)
I added a breakpoint to poke around and found this to be the case:
self.get_url_base()
-> 'https://domain.service.net/wiki/'
path
-> rest/api/content/search
self.get_url_base()
-> 'https://domain.service.net/wiki/'path
-> /rest/api/content/search?next=true&cursor=FOO&expand=history.lastUpdated&limit=25&start=25&after=1699493398&cql=space%3DBAR+and+type%3Dpage
In other words, the first path has no leading /
whereas the path returned by the API does. Looks like this is an implementation detail on my end after all :slightly_smiling_face:
Feels a bit janky but the following works:
cursor_value: "{{ response._links.next[1:] }}"
@edfincham thanks for providing details on your issue. we'll prioritize fixing this over the next couple of weeks
Environment
Current Behavior
When a requester's
url_base
has an additional path, e.g.https://www.example.com/some/path
instead of justhttps://www.example.com
, then theCursorPagination
pagination strategy breaks. It removes the additional path part of the base URL when requesting subsequent pages.For example, when I update the Workable connector's base URL to
"https://{{ config['account_subdomain'] }}.workable.com/spi/v3"
and the jobs stream's path to"/jobs"
, then:"https://test-432879.workable.com/spi/v3/jobs?created_after=20221001T115616Z&limit=1
.\"paging\":{\"next\":\"https://test-432879.workable.com/spi/v3/jobs?created_after=20221001T115616Z&limit=1&since_id=2a1018\"}}"
in the response.https://test-432879.workable.com/jobs?created_after=20221001T115616Z&limit=1&since_id=2a1018&created_after=20221001T115616Z&limit=1
(spi/v3
base path missing).Expected Behavior
Airbyte should use the full URL returned by the API to request the next page.
Logs
Steps to Reproduce
workable.yaml
with the following: