airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.11k stars 4.12k forks source link

[No code connector builder] cannot use next_page_token in request JSON body interpolation #40697

Open iliyasned opened 4 months ago

iliyasned commented 4 months ago

Topic

Possible bug with next_page_token interpolation (Airbyte Cloud)

Relevant information

Issue description

When I try to use the next_page_token variable in a freeform json request body, it is not interpolated and as a result not set in the outgoing request

Here is the request body:

{
  "query": {
    "operator": "AND",
    "value": [
      {
        "field": "created_at",
        "operator": ">",
        "value": "1306054154"
      }
    ]
  },
  "pagination": {
    "per_page": 20,
    "starting_after": "{{next_page_token}}" # I also tried {{ next_page_token['next_page_token'] }} and {{ next_page_token['next_page_token']['starting_after']
  }
}

Here is the paginator:

type: DefaultPaginator
pagination_strategy:
  type: CursorPagination
  cursor_value: '{{ response.get("pages", {}).get("next", {}).get("starting_after", {}) }}'
  stop_condition: >-
    {{ not response.get("pages", {}).get("next", {}).get("starting_after", {})
    }}

Here is the response:

{
  "status": 200,
  "body": {
    "type": "ticket.list",
    "pages": {
      "type": "pages",
      "next": {
        "page": 2,
        "starting_after": "WzE3MTk5MTY5NzYwMDAsNTg1LDJd"
      },
      "page": 1,
      "per_page": 20,
      "total_pages": 11
    },
    "total_count": 219,
    "tickets": [... ]
  },
  "headers": {
    "Date": "Wed, 03 Jul 2024 10:07:30 GMT",
    "Content-Type": "application/json; charset=utf-8",
    "Transfer-Encoding": "chunked",
    "Connection": "keep-alive",
    "Status": "200 OK",
    "X-RateLimit-Limit": "1667",
    "X-RateLimit-Reset": "1720001250",
    "Vary": "Accept,Accept-Encoding",
    "X-RateLimit-Remaining": "1666",
    "X-Intercom-Version": "7c0b4cd723debb71533563e6ee48379600600ab2",
    "Content-Encoding": "gzip",
    "X-Request-Id": "001ms34bp7aj8elhac20",
    "ETag": "W/\"2486fdfbb7066d3ddaa912b2664da0a0\"",
    "X-Frame-Options": "SAMEORIGIN",
    "Cache-Control": "max-age=0, private, must-revalidate",
    "Strict-Transport-Security": "max-age=31556952; includeSubDomains; preload",
    "X-XSS-Protection": "1; mode=block",
    "X-Request-Queueing": "0",
    "Intercom-Version": "2.11",
    "X-Runtime": "2.640149",
    "X-Content-Type-Options": "nosniff",
    "Server": "nginx",
    "x-ami-version": "ami-03ba2b5f972368d27"
  }
}

Here is the request for the second page (which I would expect to have the starting_after WITHIN pagination, not outside of it as shown):

{

  "url": "https://api.intercom.io/tickets/search",
  "body": {
    "query": {
      "operator": "AND",
      "value": [
        {
          "field": "created_at",
          "operator": ">",
          "value": 1306054154
        }
      ]
    },
    "pagination": {
      "per_page": 20
    },
    "starting_after": "WzE3MTk5MTY5NzYwMDAsNTg1LDJd"
  },
  "headers": {
    "User-Agent": "python-requests/2.32.3",
    "Accept-Encoding": "gzip, deflate",
    "Accept": "*/*",
    "Connection": "keep-alive",
    "Content-Type": "application/json",
    "Intercom-Version": "2.11",
    "Authorization": "Bearer ****",
    "Content-Length": "186"
  },
  "http_method": "POST"
}
Full YAML `version: 2.0.0 type: DeclarativeSource check: type: CheckStream stream_names: - intercom tickets definitions: streams: intercom tickets: type: DeclarativeStream next_page_token: '{{ $.response.pages.next.starting_after}}' name: intercom tickets retriever: type: SimpleRetriever requester: $ref: '#/definitions/base_requester' path: /tickets/search http_method: POST request_headers: Content-Type: application/json Intercom-Version: '2.11' request_body_json: query: operator: AND value: - field: created_at operator: '>' value: '1306054154' pagination: per_page: 20 starting_after: '{{next_page_token}}' record_selector: type: RecordSelector extractor: type: DpathExtractor field_path: - tickets paginator: type: DefaultPaginator page_token_option: type: RequestOption inject_into: body_json field_name: starting_after pagination_strategy: type: CursorPagination cursor_value: >- {{ response.get("pages", {}).get("next", {}).get("starting_after", {}) }} stop_condition: >- {{ not response.get("pages", {}).get("next", {}).get("starting_after", {}) }} schema_loader: type: InlineSchemaLoader schema: $ref: '#/schemas/intercom tickets' base_requester: type: HttpRequester url_base: https://api.intercom.io/ authenticator: type: BearerAuthenticator api_token: '{{ config["api_key"] }}' streams: - $ref: '#/definitions/streams/intercom tickets' spec: type: Spec connection_specification: type: object $schema: http://json-schema.org/draft-07/schema# required: - api_key properties: api_key: type: string order: 0 title: API Key airbyte_secret: true additionalProperties: true metadata: autoImportSchema: intercom tickets: true schemas: intercom tickets: type: object $schema: http://json-schema.org/schema# additionalProperties: true properties: type: type: - string - 'null' admin_assignee_id: type: - string - 'null' category: type: - string - 'null' contacts: type: - object - 'null' properties: type: type: - string - 'null' contacts: type: - array - 'null' items: type: - object - 'null' properties: type: type: - string - 'null' external_id: type: - string - 'null' id: type: - string - 'null' created_at: type: - number - 'null' id: type: - string - 'null' is_shared: type: - boolean - 'null' linked_objects: type: - object - 'null' properties: type: type: - string - 'null' data: type: - array - 'null' items: type: - object - 'null' properties: type: type: - string - 'null' id: type: - string - 'null' has_more: type: - boolean - 'null' total_count: type: - number - 'null' open: type: - boolean - 'null' team_assignee_id: type: - string - 'null' ticket_attributes: type: - object - 'null' properties: Order Identifier: type: - number - 'null' Partner / Sirdab: type: - string - 'null' Sirdab SKU: type: - number - 'null' Type of Order: type: - string - 'null' _default_description_: type: - string - 'null' _default_title_: type: - string - 'null' ticket_id: type: - string - 'null' ticket_parts: type: - object - 'null' properties: type: type: - string - 'null' ticket_parts: type: - array - 'null' items: type: - object - 'null' properties: type: type: - string - 'null' assigned_to: type: - object - 'null' properties: type: type: - string - 'null' id: type: - string - 'null' attachments: type: - array - 'null' items: type: - object - 'null' properties: type: type: - string - 'null' content_type: type: - string - 'null' filesize: type: - number - 'null' name: type: - string - 'null' url: type: - string - 'null' author: type: - object - 'null' properties: type: type: - string - 'null' email: type: - string - 'null' id: type: - string - 'null' name: type: - string - 'null' body: type: - string - 'null' created_at: type: - number - 'null' external_id: type: - string - 'null' id: type: - string - 'null' part_type: type: - string - 'null' previous_ticket_state: type: - string - 'null' redacted: type: - boolean - 'null' ticket_state: type: - string - 'null' updated_at: type: - number - 'null' total_count: type: - number - 'null' ticket_state: type: - string - 'null' ticket_state_external_label: type: - string - 'null' ticket_state_internal_label: type: - string - 'null' ticket_type: type: - object - 'null' properties: type: type: - string - 'null' archived: type: - boolean - 'null' category: type: - string - 'null' created_at: type: - number - 'null' description: type: - string - 'null' icon: type: - string - 'null' id: type: - string - 'null' is_internal: type: - boolean - 'null' name: type: - string - 'null' ticket_type_attributes: type: - object - 'null' properties: type: type: - string - 'null' data: type: - array - 'null' items: type: - object - 'null' properties: type: type: - string - 'null' archived: type: - boolean - 'null' created_at: type: - number - 'null' data_type: type: - string - 'null' default: type: - boolean - 'null' description: type: - string - 'null' id: type: - string - 'null' input_options: type: - object - 'null' properties: list_options: type: - array - 'null' items: type: - object - 'null' properties: archived: type: - boolean - 'null' id: type: - string - 'null' label: type: - string - 'null' multiline: type: - boolean - 'null' name: type: - string - 'null' order: type: - number - 'null' required_to_create: type: - boolean - 'null' required_to_create_for_contacts: type: - boolean - 'null' ticket_type_id: type: - number - 'null' updated_at: type: - number - 'null' visible_on_create: type: - boolean - 'null' visible_to_contacts: type: - boolean - 'null' workspace_id: type: - string - 'null' updated_at: type: - number - 'null' workspace_id: type: - string - 'null' updated_at: type: - number - 'null' `
natikgadzhi commented 4 months ago

A few things going on here! First off, thank you A LOT for such a detailed report. I'm supporting our Connector Builder team, I'll take a look later today + tomorrow.

Out of curiosity, I see you're building against Intercom. Is there any reason why you can's use our source-intercom connector? Any missing streams / columns there? I would love to help and add the missing pieces in Intercom proper, as we're migrating it to low-code.

And to the point — I'll try to get our sandbox credentials and run the queries you're trying to run and see if I can reproduce the problem.

iliyasned commented 4 months ago

Hi Natik, thanks for looking into this. We're using the custom builder because Airbyte doesn't yet have ticket information streaming from Intercom out-of-the-box, so we were trying to use their REST API to get it working but we ran into this weirdly unsolvable 'next_page_token' issue.

sherifnada commented 4 months ago

Looked into this and I think I found the root cause:

What does make me sure that the issue is somewhere on the line of code I shared above is that when I add incremental sync to the stream (and thus make it so that the CDK doesn't create a ResumableFullRefreshCursor for this stream) everything works exactly as expected. Here is the updated YAML that makes things work:

Basically all we added was this block to the stream definition:

incremental_sync:
  type: DatetimeBasedCursor
  cursor_field: created_at
  cursor_datetime_formats:
    - '%s'
  datetime_format: '%s'
  start_datetime:
    type: MinMaxDatetime
    datetime: '{{ config["start_date"] }}'
    datetime_format: '%Y-%m-%dT%H:%M:%SZ'

We should probably keep the issue open until the CDK fix is merged. But I think we are unblocked for now.

full yaml version: 2.0.0 type: DeclarativeSource check: type: CheckStream stream_names: - tickets definitions: streams: tickets: type: DeclarativeStream name: tickets retriever: type: SimpleRetriever requester: $ref: '#/definitions/base_requester' path: /tickets/search? http_method: POST request_headers: Content-Type: application/json Intercom-Version: '2.11' request_body_json: sort: field: created_at order: ascending query: value: - field: created_at value: >- {{ stream_slice.get('start_date', timestamp(config['start_date'])) }} operator: '>' operator: AND pagination: per_page: 50 starting_after: '{{ next_page_token[''next_page_token''] }}' record_selector: type: RecordSelector extractor: type: DpathExtractor field_path: - tickets paginator: type: DefaultPaginator pagination_strategy: type: CursorPagination cursor_value: >- {{ response.get("pages", {}).get("next", {}).get("starting_after", {}) }} stop_condition: >- {{ not response.get("pages", {}).get("next", {}).get("starting_after", {}) }} incremental_sync: type: DatetimeBasedCursor cursor_field: created_at cursor_datetime_formats: - '%s' datetime_format: '%s' start_datetime: type: MinMaxDatetime datetime: '{{ config["start_date"] }}' datetime_format: '%Y-%m-%dT%H:%M:%SZ' schema_loader: type: InlineSchemaLoader schema: $ref: '#/schemas/tickets' base_requester: type: HttpRequester url_base: https://api.intercom.io/ authenticator: type: BearerAuthenticator api_token: '{{ config["api_key"] }}' streams: - $ref: '#/definitions/streams/tickets' spec: type: Spec connection_specification: type: object $schema: http://json-schema.org/draft-07/schema# required: [] properties: {} additionalProperties: true metadata: autoImportSchema: tickets: false yamlComponents: global: - authenticator schemas: tickets: type: object $schema: http://json-schema.org/schema# additionalProperties: true properties: type: type: - string - 'null' admin_assignee_id: type: - string - 'null' category: type: - string - 'null' contacts: type: - object - 'null' properties: type: type: - string - 'null' contacts: type: - array - 'null' items: type: - object - 'null' properties: type: type: - string - 'null' external_id: type: - string - 'null' id: type: - string - 'null' created_at: type: - number - 'null' id: type: - string - 'null' is_shared: type: - boolean - 'null' linked_objects: type: - object - 'null' properties: type: type: - string - 'null' data: type: - array - 'null' has_more: type: - boolean - 'null' total_count: type: - number - 'null' open: type: - boolean - 'null' team_assignee_id: type: - string - 'null' ticket_attributes: type: - object - 'null' properties: Order Identifier: type: - number - 'null' Type of Order: type: - string - 'null' _default_description_: type: - string - 'null' _default_title_: type: - string - 'null' ticket_id: type: - string - 'null' ticket_parts: type: - object - 'null' properties: type: type: - string - 'null' ticket_parts: type: - array - 'null' items: type: - object - 'null' properties: type: type: - string - 'null' assigned_to: type: - object - 'null' properties: type: type: - string - 'null' id: type: - string - 'null' attachments: type: - array - 'null' author: type: - object - 'null' properties: type: type: - string - 'null' email: type: - string - 'null' id: type: - string - 'null' name: type: - string - 'null' body: type: - string - 'null' created_at: type: - number - 'null' id: type: - string - 'null' part_type: type: - string - 'null' previous_ticket_state: type: - string - 'null' redacted: type: - boolean - 'null' ticket_state: type: - string - 'null' updated_at: type: - number - 'null' total_count: type: - number - 'null' ticket_state: type: - string - 'null' ticket_state_external_label: type: - string - 'null' ticket_state_internal_label: type: - string - 'null' ticket_type: type: - object - 'null' properties: type: type: - string - 'null' archived: type: - boolean - 'null' category: type: - string - 'null' created_at: type: - number - 'null' description: type: - string - 'null' icon: type: - string - 'null' id: type: - string - 'null' is_internal: type: - boolean - 'null' name: type: - string - 'null' ticket_type_attributes: type: - object - 'null' properties: type: type: - string - 'null' data: type: - array - 'null' items: type: - object - 'null' properties: type: type: - string - 'null' archived: type: - boolean - 'null' created_at: type: - number - 'null' data_type: type: - string - 'null' default: type: - boolean - 'null' description: type: - string - 'null' id: type: - string - 'null' input_options: type: - object - 'null' properties: list_options: type: - array - 'null' items: type: - object - 'null' properties: archived: type: - boolean - 'null' id: type: - string - 'null' label: type: - string - 'null' multiline: type: - boolean - 'null' name: type: - string - 'null' order: type: - number - 'null' required_to_create: type: - boolean - 'null' required_to_create_for_contacts: type: - boolean - 'null' ticket_type_id: type: - number - 'null' updated_at: type: - number - 'null' visible_on_create: type: - boolean - 'null' visible_to_contacts: type: - boolean - 'null' workspace_id: type: - string - 'null' updated_at: type: - number - 'null' workspace_id: type: - string - 'null' updated_at: type: - number - 'null'
sherifnada commented 4 months ago

I also made a probably-not-entirely-correct PR adding this stream to the existing Intercom connector

natikgadzhi commented 4 months ago

@brianjlai, any chance you can pick this up? /cc @girarda

brianjlai commented 3 months ago

@natikgadzhi probably not, i'm mostly focused on the automatic RFR stuff for the time being

ivanlm commented 2 weeks ago

I've had the same issue and found the "next_page_token" attribute is available in the "stream_partition" object. Not sure if this is a workaround or new method, as via next_page_token object it was not working for me either when trying to update a no code custom connector, under Text (Free Form) request body.

natikgadzhi commented 1 week ago

/cc @lmossman I think this is an interesting one. Am I right to assume that a solution is to pass next page token to interpolation context dicts in the CDK side?

lmossman commented 5 days ago

@natikgadzhi based on the comment above it sounds like this may already be doable with {{ stream_partition['next_page_token'] }} in the request body

ivanlm commented 3 days ago

I've had the same issue and found the "next_page_token" attribute is available in the "stream_partition" object.

An update to the comment above, this workaround is not working when there is a "Parent Stream" or "Parameterized Requests" is being used. In this case the contents of "stream_partition" do not contain the "next_page_token". This became a blocker for one particular use case we have.

lmossman commented 2 days ago

@natikgadzhi @maxi297 could you look into why the next_page_token is not available to request_body_json in the HttpRequester in this case the user has stated above?