dlt-hub / verified-sources

Contribute to dlt verified sources 🔥
https://dlthub.com/docs/walkthroughs/add-a-verified-source
Apache License 2.0
64 stars 47 forks source link

Extend generic API source to allow for incremental path parameters #562

Open maxestorr opened 3 weeks ago

maxestorr commented 3 weeks ago

Source name

rest_api

Describe the data you'd like to see

I am using the generic API source to write my data pipeline declaratively, ingesting data from the ebird historical observations endpoint.

As you can see in its documentation linked above it's possible to incrementally load data from this endpoint, but not using traditional query parameters such as ?data_from=2024-08-01 but rather using path parameters, where each day has it's own endpoint path such as https://servername.com/2/data/obs/{{region-code}}/{{year}}/{{month}}/{{day}}.

Currently the generic API source allows you to query data incrementally using query parameters, using a config defined as so:

{
    "path": "posts",
    "data_selector": "results",  # Optional JSONPath to select the list of posts
    "params": {
        "created_since": {
            "type": "incremental",
            "cursor_path": "created_at", # The JSONPath to the field we want to track in each post
            "initial_value": "2024-01-25",
        },
    },
}

But I believe there's no such config that'd work for path parameters.

Are you a dlt user?

Yes, I'm already a dlt user.

Do you ready to contribute this extension?

No.

dlt destination

duck db

Additional information

No response

maxestorr commented 3 weeks ago

I'm about to head out but can share the code I'm working with on my return, my wider project's context is I'm trying to ingest data from this API using dlt's generic API declarative config, as well as Airflow for orchestration, and ran into a number of issues (this being one of them) which has prevented me from achieving this.