dlt-hub / verified-sources

Contribute to dlt verified sources 🔥
https://dlthub.com/docs/walkthroughs/add-a-verified-source
Apache License 2.0
52 stars 40 forks source link

rest_api: Clarify how we specify incremental loading #516

Open willi-mueller opened 4 days ago

willi-mueller commented 4 days ago

Source name

rest_api

Describe the data you'd like to see

Currently, the declarative rest_api source offers two APIs to specify incremental data loads:

  1. declaring one parameter of type incremental (query parameter level in the config dictionary)
  2. declaring an incremental load at the resource level

Example for method 1:

"resources": [
    {
        "name": "posts",
        "endpoint": {
            "params": {
                "limit": 100,
                "since": {
                    "type": "incremental",
                    "cursor_path": "updated_at",
                    "initial_value": "2024-01-01",
                    "end_value": "2024-01-31",
                    "transform": callback,
                },
            },
        },
    },
],

Example for method 2:

"resources": [
    {
        "name": "posts",
        "endpoint": {
            "incremental": {
                "start_param": "since",
                "end_param": "until",
                "cursor_path": "updated_at",
                "initial_value": "2024-01-01",
                "end_value": "2024-01-31",
                "transform": callback,
            },
        },
    },
],

This poses some challenges:

  1. method 1 (query parameter level) has less features than method 2 (resource level) because method 1 does not support the end_param. The reason is that it is nested as a child of the start_param.
  2. The code for both methods is redundant
  3. users might be confused by having two APIs for the same thing but one API being slightly less powerful
  4. the rest_api source creates a dlt.sources.Incremental. However, the current integration with that incremental class might not be ideal because the rest_api source holds and applies the transform function, which allows value transformations, such as epoch to datetime.

Proposal

Are you a dlt user?

Yes, I'm already a dlt user.

Do you ready to contribute this extension?

Yes, I'm ready.