Widen / tap-rest-api-msdk

`tap-rest-api-msdk` is a Singer tap for generic rest-apis, built with the Meltano SDK for Singer Taps.
Apache License 2.0
22 stars 25 forks source link

debug tips? #60

Open techieshark opened 4 days ago

techieshark commented 4 days ago

Hi, thanks for this plugin!

Is there a config option or any way to get more detailed logging?

I am seeing the following when hitting a REST API and trying to send it to a CSV:

ERROR    | tap-rest-api-msdk.tap-rest-api-msdk | An unhandled error occurred while syncing ...
...
[info     ] raise FatalAPIError(msg)   cmd_type=elb consumer=False job_name=dev:tap-rest-api-msdk-to-target-csv name=tap-rest-api-msdk producer=True run_id=560a535a-4e6b-4ed3-a5d3-c3f2bfb63dd3 stdio=stderr string_id=tap-rest-api-msdk

[info     ] singer_sdk.exceptions.FatalAPIError: 400 Client Error: Bad Request for path: /api/v1/accounts/xyz-redacted-123/transactions cmd_type=elb consumer=False job_name=dev:tap-rest-api-msdk-to-target-csv name=tap-rest-api-msdk producer=True run_id=560a535a-4e6b-4ed3-a5d3-c3f2bfb63dd3 stdio=stderr string_id=tap-rest-api-msdk

What would be useful is to be able to see the full info about the Request in order to understand why the response is 400. That is, to see not just the path but the full URL including any URL parameters.

I've tested out the path above outside of meltano (e.g. using curl/postman etc) and it works, but I suspect that somehow the next page URL is not being sent correctly (it is a 400 error, not a 401). From the API docs (snippet below), it seems that the authorization is not the problem:

image

I've also tried setting up meltano for more verbose logging, but that doesn't seem to expose the info I'm after.


Additional context

The response from the API is like this:

{
    "data": [...]
    "links": {
        "prev": null,
        "next": "https://hostname/api/v1/accounts/xyz/transactions?page%5Bafter%5D=some-base64-encoded-id%3D%3D&page%5Bsize%5D=20"
    }
}

in my .env I've got:

TAP_REST_API_MSDK_NEXT_PAGE_TOKEN_PATH='$.links.next'

and my config is like:

cat meltano.yml
version: 1
default_environment: dev
project_id: e65b13c4-000a-4fd5-97ef-764a89675ff7
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-rest-api-msdk
    variant: widen
    pip_url: tap-rest-api-msdk
    config:
      auth_method: bearer_token
      flattening_enabled: true
      flattening_max_depth: 10
      api_url: https://hostname/api/v1/
      records_path: $.data[*]
      path: /
      streams:
      - name: txns
        path: accounts/some-id-redacted/transactions
    select:
    - '*.*'
  loaders:
  - name: target-csv
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/target-csv.git
techieshark commented 4 days ago

Update: perhaps my pagination strategy was just incorrect above. After trying a couple things, this config seems to work:

version: 1
default_environment: dev
project_id: e65b13c4-000a-4fd5-97ef-764a89675ff7
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-rest-api-msdk
    variant: widen
    pip_url: tap-rest-api-msdk
    config:
      auth_method: bearer_token
      flattening_enabled: true
      flattening_max_depth: 10
      api_url: https://hostname
      records_path: $.data[*]
      pagination_request_style: jsonpath_paginator
      pagination_response_style: hateoas_body
      path: /api/v1
      streams:
      - name: txns
        path: /api/v1/accounts/some-id-redacted/transactions
    select:
    - '*.*'
  loaders:
  - name: target-csv
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/target-csv.git

The addition was:

      pagination_request_style: jsonpath_paginator
      pagination_response_style: hateoas_body

and I changed the api/path prefix (probably not in the most ideal way though, as it is now duplicated).

Mostly I added this context/comment just in case it helps anyone else, but I'm still curious about the best way to improve observability of the network requests if anyone has tips, thanks.