dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.73k stars 181 forks source link

FEAT: parameterised headers rest_api_source #2084

Open ArneDePeuter opened 1 week ago

ArneDePeuter commented 1 week ago

Description

This pull request addresses the feature request discussed in issue #2071. It introduces dynamic parameter resolution for headers, allowing both keys and values to be resolved dynamically.

Key Features

Usage

The feature is demonstrated with the following example:

{
    "name": "chicken",
    "endpoint": {
        "path": "chicken",
        "headers": {"{token}": "{token}", "num": "2"},
        "params": {
            "token": {
                "type": "resolve",
                "field": "token",
                "resource": "authenticate",
            },
        },
    },
}

Tests

Comprehensive tests have been added to validate the implementation and ensure its correctness.

netlify[bot] commented 1 week ago

Deploy Preview for dlt-hub-docs canceled.

Name Link
Latest commit 5396496b134c34b4fb2bd4bc48ae2b2b7feb1377
Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/67478a34a66b61000868821c
ArneDePeuter commented 6 days ago

Excuse me for the failed checks, I'll have a look at it!

ArneDePeuter commented 6 days ago

All checks should pass now, also added an extra test!

burnash commented 3 days ago

Hey @ArneDePeuter thanks for the PR and for adding tests! I'll review it soon.

ArneDePeuter commented 2 days ago

@burnash I removed the recursive parameter resolution for headers, as this was not useful.

ArneDePeuter commented 2 days ago

I now have following question. In: dlt/sources/rest_api/__init__.py:366 would it be useful that resolved params gets extended with the incremental params? This way incrementals can be in the url as in the headers.

Implemented as this for example:

# dlt/sources/rest_api/__init__.py:366 
if incremental_object:
  params = _set_incremental_params(
      params,
      incremental_object,
      incremental_param,
      incremental_cursor_transform,
  )
  resolved_params.extend(params)

for item in items:
  formatted_path, formatted_headers, parent_record = process_parent_data_item(
      path, item, resolved_params, include_from_parent, headers
  )

Secondly, I find it confusing that parameter definitions are intertwined with query params. Is this something that has been brought up before?

burnash commented 21 hours ago

@burnash I removed the recursive parameter resolution for headers, as this was not useful.

Thanks for the update, @ArneDePeuter

I now have following question. In: dlt/sources/rest_api/init.py:366 would it be useful that resolved params gets extended with the incremental params? This way incrementals can be in the url as in the headers.

I don't see a need to implement incremental in headers. I haven't seen a use case for incremental parametes in headers yet. If such a case exists, I would suggest implementing a dlt source using RESTClient rather than the rest_api source.

Secondly, I find it confusing that parameter definitions are intertwined with query params. Is this something that has been brought up before?

Good point. I have already received this feedback from someone. The way the rest_api source defines parameters is loosely based on how parameters are defined in the OpenAPI specification. I don't find the current implementation too confusing, as the end user has control over how path parameters are named and resolved. They are also excluded from the query parameters. However, I'm open to suggestions on how to improve this.

If we extend params to include header params (as in this PR), it might become confusing. I was thinking about different ways to address this. I was considering different ways to address this. One option is to take inspiration from the suggested configuration for query string parameters in https://github.com/dlt-hub/dlt/issues/1978 and define the location of the parameter in a location key. For example, from your scenario:

"endpoint": {
    "path": "issues/comments",
    "headers": {"token": "{token}", "static_param":  region}
    "params": {
        "token": {
            "type": "resolve",
            "resource": "authenticate",
            "field": "token",
            "location": "header"
        }
    },
},

What do you think? Would you be willing to update this PR to handle the location key?

ArneDePeuter commented 20 hours ago

Thanks for your response!

Good suggestion for the location field, ill have a look at it in the upcoming days.

ArneDePeuter commented 18 hours ago

I added the location field. The current implementation isn't very clean and could use further refinement.

That said, I believe this feature does not enhance clarity and might be better reverted. I preferred the previous approach, where the resolve parameter was not tied to a specific location and applied universally to matching names.

If we decide to proceed with this change, I suggest making location a list instead of a singular value. This would help avoid code duplication if someone needs to reuse a resolvedParam across multiple locations.